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LEE 


Concept attainment by 151 Ss was studied under 4 conditions of informa- 
tion transmission equated im value: (a) initial positive instance, (b) ini- 
tial positive instance plus list of possible hypotheses remaining, (c) initial 
positive instance plus list of hypotheses eliminated, and (d) initial posi- 
tive instance and exposure to array comprised only of positive instances 
of possible remaining hypotheses. The task was attainment of a single 
2-attribute concept by means of free selection from an array completely 
visible to S. Results indicated: (a) groups did not differ in number of 
instances required for solution or in relevance of Ist verbalized hypoth- 
eses; (b) inefficient performance was not attributable to failure of 
information assimilation, but Ss did not utilize all available information; 
(c) Conditions III and IV resulted in a significantly greater number of 


redundant hypotheses. 





Recent research in concept attainment 
has sharply delineated the necessity for 
control of the amount of information 
available to subjects exposed to different 
experimental conditions (Hovland, 
1952: Hovland & Weiss, 1953; Wallace 
& Sechrest, 1961). Failure to equate the 
amount of information across treatments 
yields equivocal results in which pre- 
sumed treatment effects are hopelessly 
confounded with differential amounts of 
information. While control of objective 
information is clearly a desideratum, 
scant attention has been paid to the 
possible influence of the formal method 


1 This investigation was completed while the 
second author was on a United States Public 
Health predoctoral research fellowship. The 
writers wish to thank Charles Hempel for as- 
sistance in data collection and David Isaacs for 
a careful reading of the manuscript. 


157 


of information transmission in concept 
attainment research. Cahill and Hov- 
land (1960) have shown that successive 
presentation of instances results in a 
poorer performance than simultaneous 
presentation of instances. Since the ob- 
jective amount of information trans- 
mitted to subjects was the same regard- 
less of experimental condition, Cahill 
and Hovland concluded that the method 
of transmission was significantly related 
to performance. Subjects serving in the 
successive instances condition were re- 
quired to retain previous instances in 
memory rather than having such in- 
formation directly available for inspec- 
tion as did subjects serving in the simul- 
taneous instances condition. 

The major purpose of the present in- 
vestigation was to compare subjects’ 
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assimilation and utilization of informa- 
tion under varied conditions of informa- 
tion transmission. Although in a closed 
system of concepts an initial positive 
instance eliminates many possible con- 
cepts, if all the information inherent in 
the instance is used, it is by no means 
certain that subjects will recognize and 
employ all the information available to 
them. However, an imperfect or less- 
than-maximally efficient performance 
may be attributed either to a failure to 
use all available information from an 
initial positive instance or to the subse- 
quent utilization of inefficient strategy, 
e.g., redundant choices. In this investi- 
gation four methods of information 
transmission were employed. Condition 
I consisted simply of presentation of a 
single positive instance of the concept to 
be attained. Condition II, in addition 
to a single positive instance of the con- 
cept to be attained, included the pres- 
entation of a printed list of the remain- 
ing possible hypotheses as to the nature 
of the concept. Similarly, Condition III 
included both a single positive instance 
and a printed list of hypotheses. How- 
ever, the list of hypotheses included in 
Condition III comprised the hypotheses 
which were eliminated by the first posi- 
tive instance. In Condition IV a single 
positive instance was again presented, 
but the array of instances shown to the 
subject was reduced by eliminating all 
instances which were negative for all the 
possible hypotheses remaining after the 
initial positive instance. It should be 
noted that while the subjects in Condi- 
tions II and III received lists of hy- 
potheses, this information was theoreti- 
cally redundant in that the printed 
material contained the same information 
objectively transmitted by a single posi- 
tive instance. Thus, formal method of 
transmission varied across conditions 
but objective amount of information 
transmitted was identical. 


METHOD 


Materials 


The materials employed in this study con- 
sisted of arrays of cards constructed in a man- 
ner suggested by Bruner, Goodnow, and Austin 
(1956, p. 42). The full array employed in 
Conditions I, II, and III consisted of 81 in- 
stances, all possible combinations of four at- 
tributes exhibiting three values each as follows: 
form (square, triangle, cross), color (red, yel- 
low, black), mumber (one, two, three), and 
borders (one, two, three). The attributes were 
displayed on white cards, 3” & 14”. The re- 
duced array employed in Condition IV con- 
sisted of 33 instances. The instances compris- 
ing the reduced array consisted of those rele- 
vant to the six possible hypotheses remaining 
after the presentation of a single positive in- 
stance. Each instance of this array was a posi- 
tive exemplar of at least one of the remaining 
six hypotheses. The reduced array was in- 
cluded in the present study to investigate the 
effects, if any, of restriction of the range of 
stimulus materials to instances relevant to 
hypotheses remaining after an initial positive 
instance. For example, the reduced array pre- 
vented a subject from making totally unin- 
formative choices, at least on the first choice. 
The array was also reduced in apparent com- 
plexity by the elimination of so many of the 
stimulus cards. Both the reduced array and 
the full array included nine positive exemplars 
of the concept to be attained. However, the 
probability of selection of a positive exemplar 
of the concept was 1/9 in the full array and 
3/11 in the reduced array. Thus, the reduced 
array diminished considerably the probability 
of selection of instances considerably removed 
from the remaining possible hypotheses follow- 
ing an initial positive instance. 

The printed lists of hypotheses employed in 
Conditions II and III were single spaced, typed 
sheets of regular 84” X11” white typing 
paper. For each condition all of the hypothe- 
ses were listed on a single sheet of paper. In 
the case of Condition III the 48 hypotheses 
eliminated by the initial positive instance 
were grouped systematically by attributes and 
values. 


Procedure 


General instructions for all the subjects as 
well as instructions given to the subjects in 
Condition I were as follows: 


[General instructions] This is an experi- 
ment in what is called concept attainment. 
Throughout this session you will be asked 
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to try to discover concepts (different ways 
of grouping the materials) which I will have 
in mind. 

The materials we will be using are these 
cards you see before you. Notice that each 
of these cards displays a square, cross, or 
triangle. Also, these figures may be red, 
black, or yellow and are enclosed by one, 
two, or three borders. Notice that there are 
one, two, or three figures on each card. 

These features that I have just pointed out 
to you are called attributes. In this collec- 
tion of cards there are four attributes. These 
are (a) type of figure (square, cross, or tri- 
angle), (6) color of figure (red, yellow, or 
black), (c) number of borders (one, two, or 
three), and (d) number of figures (one, two, 
or three). 

It is possible to combine these attributes 
in many different ways forming categories 
in which a number of cards may be placed. 
For example, all cards witk two crosses is a 
category or concept to which you would 
assign all of the cards which have on them, 
two crosses. These categories are the con- 
cepts you will be asked to find. In each case, 
as in the example given, you will be asked to 
find categories involving only two attributes 
at a time. For example, you will not be 
asked to discover a category such as all cards 
with two red crosses since this would involve 
three attributes, number of figures, color of 
figures, and type of figures. 

Now, can you make up a category using 
these cards? Remember, it should involve 
only two attributes. 

If a subject misses the point, correct his mis- 
take, re-explain the nature of the task, and ask 
him to make up another category. If the sub- 
ject shows that he understands the task by 
correctly devising a two attribute category, 
proceed by saying: 


Now show me a few cards which describe 
the concept which you have made up. 


If the subject is again correct, say: 


Good, I think you have the idea of the 
type of concept we will be using in the 
experiment. 


[Specific instructions—Condition I] In the 
following problem I will begin by showing 
you a card which will be a positive example 
of the concept that I have in mind. After 
I show you the card you will be asked to 
select more cards from the board in front 
of you. After each one of your selections I 
will tell you whether the card you have 
chosen is positive or negative. If it is posi- 


tive, you then know that both of the attri- 
butes that define the concept are present on 
the card. If it is negative, you know that 
both of the attributes that define the concept 
are mot present together on the card. You 
may take as many cards as you wish and 
you may offer one suggestion as to what 
you think the concept is that I have in mind 
after each one of your selections. Your task 
is to try to figure out what the concept is 
that I have in mind using as few of these 
cards as possible. Try to avoid guessing and 
make a real attempt to use information you 
get from the cards in a logical manner. There 
are no tricks. Do you have any questions? 
If not, let’s begin. Here is the first card 
and it is a positive example of the concept 
which I have in mind. 


After explanation as to the type of concepts 
with which they would deal, the subjects were 
asked to discover the nature of a single two 
attribute conjunctive concept which the experi- 
tmenter had in mind. The array of instances as 
well as instances selected from the array re- 
mained in full view throughout the experiment. 
Each subject, regardless of particular condition, 
was presented with a single positive instance of 
the concept to be attaincd. Subjects serving in 
Condition II were given the list of six remain- 
ing hypotheses while Condition III subjects 
were given the list of 48 eliminated hypothe- 
ses. The single positive instance accompanied 
the lists in both conditions. Condition IV sub- 
jects were given the single positive instance 
and introduced to the reduced array without 
comment by the experimenter as to how the 
array differed from the full array. 

A subject was instructed to select instances 
from the array and after each of his selections 
was told whether he had selected a negative or 
positive exemplar of the concept. After each 
of his selections the subject could offer one and 
only one hypothesis as to the nature of the 
concept. Each hypothesis was either confirmed 
(in which case the experiment was ended) or 
invalidated by the experimenter. Experimenter 
responses to verbalized hypotheses were ap- 
propriately, “correct” and “no, that is not the 
concept that I have in mind.” Selection of 
cards from the arrays continued until the sub- 
ject verbalized the correct concept. 

While general instructions were identical for 
all subjects, additional instructions appropriate 
to each condition were included for Conditions 
II, III, and IV. While all subjects received the 
same initial positive instance, three red squares 
with two borders, the concept to be attained 
was systematically varied. In both arrays, full 
and reduced, a single positive instance elimi- 
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nated all but six hypotheses as to the nature of 
the concept. These six remaining two-attribute 
conjuncions were as follows: red square, red 
and two borders, square and two borders, three 
red, three squares, three and two borders. The 
concept to be attained was systematically se- 
lected from this pool of remaining concepts 
following a single positive instance in such a 
manner that each appeared equally (approxi- 
mately) often in each condition. This pro- 
cedure was thought desirable in order to avoid 
unintentional but, nevertheless, possible com- 
munication among subjects from a common 
source and to sample more widely from the 
population of concepts available. In addition, 
systematic varying of the concept to be at- 
tained permitted an independent comparison of 
concepts qua concepts with regard to possible 
inherent levels of difficulty. This was thought 
to be, in itself, an interesting problem. 


Subjects 


The subjects were 151 college students, 76 
males and 75 females, enrolled in undergradu- 
ate introductory psychology courses. Thirty- 
eight subjects, evenly divided with regard to 
sex, were randomly assigned to Conditions II, 
III, and IV. Thirty-seven subjects, 19 males 
and 18 females, served under Condition I. 


RESULTS 


The analysis of variance on number 
of instances required to attain the con- 
cept is reported in Table 1, while means 
and standard deviations are given in 
Table 2. The results of this analysis are 
clearly not significant and indicate that 
the methods of transmission failed to 
produce reliable differences between 
conditions on number of instances re- 
quired for solution. With full recognition 
of current controversy over multiple 
comparisons following over-all F tests, 


TABLE 1 


SUMMARY OF ANALYSIS OF VARIANCE ON NUM- 
BER OF INSTANCES REQUIRED FOR SOLUTION 


Source df M: F 


1.96 


Between 
Within 
Total 


TABLE 2 


MEANS AND STANDARD DEVIATIONS FOR NUMBER 
or INSTANCES FoR CONDITIONS AND CONCEPTS 


Instance N M SD 


Condition 
I : 3.32 
II 3 3.16 
Ill 7 3.26 
IV 4.10 
Concept 
Three red 
Three squares 
Red two borders 
Red square 
Square two borders 
Three two borders 


3.85 
3.40 
3.00 
3.96 
3.25 


3.12 


2.46 
1.62 
1.75 
2.49 
1.82 
1.54 


the two most extreme means were com- 
pared through use of the new Duncan 
multiple range test (Edwards, 1960). 
This analysis was nonsignificant, and 
further pair-wise comparisons between 
means were not conducted. It should be 
noted that among available multiple 
analyses, the Duncan test is regarded as 
least conservative. 

While the analysis of variance on 
number of instances to solution was 
nonsignificant, it was thought that a 
more sensitive analysis might reveal dif- 
ferences among subjects in assimilation 
and utilization of information. For this 
reason, the first verbalized hypotheses 
were categorized as either relevant or as 
representing inference errors. A hypoth- 
esis was considered relevant if it were 
one of the six possible hypotheses re- 
maining after a single positive instance. 
On the other hand, a hypothesis was 
considered as constituting an inference 
error if it were not one of these six 
hypotheses. It is most interesting to note 
that out of 151 first verbalized hypothe- 
ses as to the nature of the concept 
sought, only four constituted inference 
errors. Two inference errors were com- 
mitted by Condition I subjects, none by 
Condition IV subjects, and one subject 
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each in Conditions IT and III. Approxi- 
mately 50% of the subjects verbalized 
a first hypothesis after one selection from 
the array and roughly 75% of the sub- 
jects after their second selection. The 
almost complete lack of inference errors 
in the total sample suggests that a ma- 
jority of the subjects made good use of 
available information quite early in the 
process of attainment. 

In addition to the analysis of inference 
errors, an analysis of individual selec- 
tions and hypotheses was conducted. At 
each point in the attainment process, 
selections and hypotheses were catego- 
rized as informative or redundant. A 
redundant selection was defined as one 
that failed to permit a subject to elimi- 
nate one additional hypothesis not elimi- 
nated by prior information. A _hy- 
pothesis was considered redundant if it 
had been logically eliminated by a prior 
selection. The number of tenable hypoth- 
eses remaining after each selection and 
after each hypothesis was computed. 


With regard to influence of methods 


of transmission, differences among 
groups in the number of redundant selec- 
tions on the first trial are critical. Sig- 
nificant differences among groups in 
number of redundant first selections did 
not obtain. In the entire sample only 
one subject selected a totally redundant 
first instance. With the exception of 
Condition IV, reduced array, the prob- 
ability of a redundant selection by 
chance was 5/8. If the initial informa- 
tion presentations had conveyed no in- 
formation at all, redundant first selec- 
tions would have been expected for 
70 subjects. The expected number of 
redundant first selections was computed 
on 113 ‘subjects in Conditions I, II, and 
III. Condition IV subjects were ex- 
cluded from this computation since the 
probability of a redundant first selection 
in this group was zero. 

Groups did not differ in mean number 


of tenable hypotheses remaining after 
each selection nor after each hypothesis. 
However, the groups did differ in the 
number of redundant hypotheses 
emitted. An analysis of variance for the 
number of redundant hypotheses yielded 
an F significant at the .05 level. A X? 
for the number of subjects in each group 
who emitted redundant hypotheses was 
significant beyond the .01 level. Con- 
ditions II and IV were nearly equal and 
high in emission of redundant hypoth- 
eses while Conditions I and III were 
nearly equal and low. There were no 
differences in the total number of hy- 
potheses emitted. 

The analysis of inference errors, in 
addition to constituting further evidence 
for the nonsignificance of methods of 
transmission, raises the interesting ques- 
tion of the efficiency of the subjects in 
general with regard to the utilization 
of information. The failure to obtain 
significant differences in number of in- 
stances prior to solution between sub- 
jects given a single positive instance and 
subjects given direct verbal information 
is critical. This suggests that while the 
subjects assimilated information effi- 
ciently, improper utilization of informa- 
tion was an important factor in produc- 
ing less than perfect performance. In 
other words, in this experiment subjects’ 
strategies in the utilization of infor- 
mation appeared to be more highly 
related to performance than assimila- 
tion of objectively transmitted infor- 
mation. The virtual absence of infer- 
ence errors suggests that utilization was 
quite high. However, remembering the 
fact that a sizable minority of the sub- 
jects did not verbalize a first hypothesis 
until or after their third selection, the 
following analysis was conducted in 
order to assess the extent to which the 
subjects utilize information. 

Assume a group of perfectly logical 
subjects who assimilate and utilize per- 
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fectly all available information. Taking 
into account probabilities of chance solu- 
tions on successive trials and reasoning 
from the standpoint of a maximally effi- 
cient (one which affords the opportunity 
of progressive elimination of the greatest 
numbers of erroneous hypotheses) strat- 
egy, it is possible to compute expected 
group performance at successive stages 
of performance. Given an initial positive 
instance, in both full and reduced array 
conditions, the probability of a correct 
solution prior to the first selection from 
the array is 1/6. (A single positive in- 
stance eliminates all but six equally 
possible hypotheses.) Under the present 
experimental conditions, the optimal 
strategy is that of varying a single attri- 
bute at a time (conservative focusing). 
A subject who followed this strategy 
would have been able to eliminate three 
additional hypotheses by his first selec- 
tion, positive or negative, from the array. 
Thus, after the first selection and prior 
to the second selection, the probability 
of a correct solution is reduced to 1/3. 
Subsequent selections under this strategy 
reduce remaining hypotheses by two in 
the case of a negative selection and one 
per selection in the case of a positive 
selection. A negative selection on Trial 2 
reduces the probability of solution to 
certainty while a positive selection fur- 
ther reduces the probability of solution 
to 1/2. Regardless of selection on 
Trial 2, negative or positive, under the 
assumptions outlined above, all the sub- 
jects should be expected to reach a solu- 
tion after their third selection and prior 
to their fourth selection. 

Comparison of expected performance 
with actual performance suggests that 
subjects as a group do not make full use 
of objectively available information. 
Whereas the expected number of subjects 
reaching solution after their third selec- 
tion is 151, only 80 subjects achieved 
solution at this point. Furthermore, 


TABLE 3 


SUMMARY OF AN ANALYSIS OF VARIANCE ON 
NUMBER OF INSTANCES REQUIRED TO ATTAIN 
Speciric CONCEPTS 


Source df MS 


3.96 
3.86 


Between 5 
Within 145 


since the subjects were permitted to 
verbalize a hypothesis after each selec- 
tion, the expected performance as com- 
puted above is conservative. This is 
possible since utilization of experimenter 
feedback constitutes an additional source 
of information and if used nonredun- 
dantly permits the elimination of one 
hypothesis per invalidation. Of a total 
of 755 hypotheses eliminated, 145, or 
approximately 19%, were eliminated 
through nonredundant experimenter 
feedback. 

With regard to possible differences in 
difficulty levels of concepts, since meth- 
ods of transmission were nonsignificant, 
it was possible to collapse the methods 
dimension forming six conditions of con- 
cepts containing approximately 25 sub- 
jects in each. Table 3 presents a sum- 
mary of an analysis of variance of total 
number of instances required to attain 
six different concepts, each subject solv- 
ing only one. As Table 3 shows, this 
analysis was also nonsignificant and in- 
dicated that, for the concepts employed 
in this experiment, differences in diffi- 
culty level did not obtain. 


DISCUSSION 


Bearing in mind the essential am- 
biguity of attempts to prove the null 
hypothesis, the failure to obtain signifi- 
cant effects by the methods of informa- 
tion transmission employed in this ex- 
periment is most interesting. The sub- 
jects were able to assimilate and utilize 
information transmitted by a single posi- 
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tive instance of a two-attribute conjunc- 
tive concept as effectively as the subjects 
given identical information via direct ver- 
bal methods. This equality of perform- 
ance was apparent in the total number 
of instances required for solution as well 
as in the relevance of first verbalized hy- 
pothesis and number of inference errors. 
However, it is most interesting to note 
that while assimilation of information 
was apparently highly efficient, the sub- 
jects, as a group, failed to utilize all 
objectively available information. Of 
particular interest is the failure of the 
many subjects to take advantage of the 
experimenter as an additional source of 
information. Since the subjects were 
allowed a single hypothesis after each 
selection from the array, each invalida- 
tion by the experimenter, if utilized 
properly, permitted the elimination of 
an additional hypothesis. While vir- 
tually all subjects verbalized a relevant 
first hypothesis, it is interesting to note 
that a sizable minority of 48 subjects did 
not verbalize a hypothesis until after 
three or more selections from the array. 
The possible advantage given by an ac- 
tive search for a solution as evidenced 
by Aypothesis spewing in conjunction 
with utilization of the experimenter as a 
validational source is suggested by the 
fact that a clear linear relationship was 
obtained between number of instances 
prior to verbalization of the first hypoth- 
esis and total instances to solution. In 
other words, the earlier the subject 
ventured a hypothesis as to the nature 
of the concept, the fewer total instances 
he required for solution. Since this 
analysis was obviously post hoc, a formal 
analysis of data is not reported and the 
following remarks are offered in the 
spirit of suggestions for future research. 

Taking into account the informational 
advantages that accrue to the subject 
who seeks validation from the experi- 
menter, as well as the possibility of a 


chance solution not available to the 
subject who does not emit hypotheses, 
it is not at all surprising that a relation- 
ship exists between the point of appear- 
ance of the first hypothesis and the total 
number of instances to solution. How- 
ever, differences among subjects in the 
point of appearance of the first hypoth- 
esis (number of prior selections of in- 
stances) is curious. The most parsi- 
monious explanation is simply that 
differential intellective ability among the 
subjects is related to both the point of 
appearance of the first hypothesis as well 
as total instances to solution. In addi- 
tion, it is possible that some subjects 
took the “avoid guessing’ instructions 
much more seriously than others. 
Considering the as yet unresolved 
problem of explication of large individ- 
ual differences routinely obtained in con- 
cept attainment studies, a plausible but 
slightly more complicated conjecture in- 
volves differential sets toward valida- 
tional responses of the experimenter. 
Subjects who establish an “information 
focus” could conceivably perceive the 
experimenter’s invalidations or possible 
invalidations as a source of information 
useful in solution and as such, positively 
reinforcing. On the other hand, subjects 
set to perceive experimenter’s invalida- 
tions as punishment establish an ex- 
pectancy of negative reinforcement and, 
reluctant to engage in hypothesis spew- 
ing, persist in the selection of instances 
from the array until virtually certain 
of solution. Whatever the reason for the 
subjects’ delays in verbalizing hypoth- 
eses, the data indicate that for approxi- 
mately 66% of the subjects, solution 
was achieved with only one, or no, in- 
stance after the first verbalized hypothe- 
sis regardless of the point of appearance. 
It is not completely clear why Condi- 
tions II and IV emitted more redundant 
hypotheses, but the result was that they 
had more opportunities for confirmation 
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during the attainment process. Across 
all measures obtained, Condition IV, 
reduced array, was so consistently differ- 
ent from the other conditions in the 
direction of reduced efficiency that it is 
the writers’ belief that Condition IV was 
very likely different in its effect from 
the other conditions. It may well be 
that the negative and redundant in- 
stances removed from the array have in 
some way an informative or confirming 
effect when they are present. 

As mentioned above, these tentative 


remarks are offered as suggestions for 
Since active search for 
hypothesis spewing 


future research. 
solution through 
appears as a possible advantageous strat- 
in concept attainment studies in- 


egy in 
volving experimenter feedback, possible 


differential expectancies toward the ex- 
perimenter’s validational responses ap- 
pear important. At any rate, all factors 
effecting active overt hypothesis forma- 
tion by the subjects constitute important 
areas of investigation. 
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WORD LEARNING 


IN AN AUTOMATED 
TEACHING SITUATION AS A FUNCTION 
OF DISPLAY CONDITION ' 

STANLEY WECHKIN 
Edward R. Johnstone Training and Research Center, Bordentown, New Jersey 


An experiment was conducted to ascertain the more efficient mode of 
presenting materials for word learning in a multiple choice automated 
teaching situation. Using 32 delinquen‘* Ss, it was found that a situation 
in which visual response alternatives were coupled with an auditory 
stimulus gave faster learning (p < .001) and consequent better retention 


than one using auditory response alternatives to visual stimulus 
results were attributed to differential fading of the stimulus trace. 


These 
Dif- 


ferential transfer (p < .01) in the 2 conditions was attributed to degree 


of completed learning. 


In teaching subjects with little or no 
reading achievement to “read” a word, 
ie., emit the correct oral equivalent of 
an orthographic symbol, in an auto- 
mated teaching situation, one is con- 
fronted with two givens. First, there is 
a strong predilection towards multiple- 
choice devices, since it is extremely diffi- 
cult to construct a machine which recog- 


nizes correct composed responses, or for 
the subject to do so himself. Secondly, 
the device must present both the written 


word and its auditory equivalent. A 
problem arises as to the best display ar- 


rangement of the material in terms of ' 


giving the fastest learning rate and/or 
the greatest retention. Specifically, 
should the printed word to be learned 
be presented visually and the correct 
equivalent and its foils auditorily, or 
should the equivalent be presented audi- 
torily and the correct orthographic sym- 
bols and its foils visually? This question 
touches upon two somewhat separate 
problems: the efficiency of learning in 
the auditory and visual modality, and 
the relative facilitation of learning as a 


1 This investigation was supported by Re- 
search Grant 7-28-073.00 from the United 
States Office of Education. The author 
grateful to Belver C. Griffith and Ronald S. 
Lipman for their helpful comments. 


is 


165 


function of stimulus and response mean- 
ingfulness. 

The available evidence on learning as 
a function of the sensory modality in- 
volved (as reviewed by Henmon, 1912; 
Koch, 1930; McGeoch & Irion, 1952, 
pp. 480-484) is confined to studies in 
which the stimulus and response terms 
were of a given modality or combination 
of modalities. Stimulus and response 
terms in these studies were both auditory 
or visual, or the stimulus terms were 
both auditory and visual and response 
was oral and/or orthographic. This 
literature indicates that there is much 
disagreement as to the most efficient 
modality. More germane evidence is in 
the area of the relative meaningfulness 
and familiarity of stimulus and response 
terms. In terms of the present problem 
the auditory term, if it is a common 
word, would be meaningful; and the 
visual term, the printed word, would be 
initially, relatively meaningless. The 
studies of Cieutat, Stockwell, and Noble 
(1958), and Hunt (1959) show that 
high response meaningfulness is more 
important in accelerating learning rate 
than high stimulus meaningfulness and 
would indicate that the condition under 
which auditory response alternatives are 
used should give faster learning. In 
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addition, this problem can be viewed in 
terms of the transfer paradigm, with the 
auditory response alternatives condition 
involving positive transfer and, hence, 
faster learning because the subject is 
connecting an old response, a common 
spoken word, to a new stimulus. In the 
visual alternatives situation there would 
be negative transfer and slower learning, 
with the subject connecting a new ortho- 
graphic response to the old stimulus of 
the spoken word. 

The present study was undertaken to 
shed empirical light on this subject. 
Trigrams were used to insure a situation 
where a surrogate for the printed would 
be unknown. In order to make the two 
situations comparable in regard to tem- 
poral aspects of response terms, it was 
decided to forego the advantage that 
might accrue to a visual response al- 
ternatives condition in an actual learn- 
ing situation as a result of concurrent 
presentation of alternatives, and present 
them consecutively instead. 


METHOD 


Materials 


Two equivalent lists were used for each of 
the two response conditions: A, and A,, for 
the auditory alternative response condition 
(Lists I and II), and V, and V1, for the 
visual alternative response condition (Lists I 
and II). The stimulus items for the A lists 
and the response items for the V lists were 
printed, low association, three-letter consonant 
units used by Witmer (1935). The response 
terms for the A lists and stimulus terms for the 
V lists were spoken, high frequency, four-letter 
monosyllabic nouns. There was low similarity 
among stimulus items and among response 
terms for any given list, as well as between 
stimulus and response pairs. There was a 
total of six paired associates in each list. Each 
stimulus item was followed by two response 
alternatives and this constituted a set. The 
incorrect alternative for the stimulus item in a 
set was the correct one for the stimulus item 
in another set, i.e., alternatives were “inside.” 
Six sets of stimulus items and response alterna- 
tives composed a trial, and the trials were ar- 
ranged in cycles of four. The order of the six 


sets within a trial was randomized. The al- 
ternatives were arranged so that the incorrect 
response item was different for each stimulus 
item in the four-trial cycle. The stimulus- 
response alternative combinations and their 
order in the two conditions were the same for 
any list number. For example, in List A, the 
first set consisted of the visual stimulus xFro 
followed by the spoken response alternatives 
WIND and CAKE, with CAKE correct. The second 
set of this list was ray followed by Harr and 
AUNT, with HAIR correct, and so forth, until 
all six pairs had been presented. List V, was 
the analog of this. The first set here consisted 
of the auditory stimulus caxe followed by the 
visual response alternatives cow and xFrqQ, with 
xFQ correct. The second set was HAIR followed 
by FHy and zxy with rxy correct. Similarly, 
Lists A;; and V,, were analogs of each other. 
The reason for this method of list construction 
was to prevent uncontrolled variation in diffi- 
culty of experimental conditions as a result of 
differences in ease of association of particular 
stimulus-response combinations. To prevent 
position guessing habits from yielding similar 
variation, the randomized position of the cor- 
rect response alternative (first or second) was 
the same for corresponding sets in all lists. 


Apparatus 


The consonant units were presented visually 
on slides projected on a wall and the common 
words were recorded on magnetic tape and 
presented auditorily. The presentation device 
was a LaBelle Maestro II combination tape 
recorder and slide projector (manufactured by 
the LaBelle Corporation, Oconomowoc, Wis- 
consin). This device allowed for a signal 
placed on the magnetic tape to trip the slides, 
and all presentations within a set were per- 
formed automatically. 


Subjects 

Thirty-two residents of the State Home for 
Girls in Trenton, New Jersey, were used. The 
subjects were all between the ages of 14 and 17 
years, and had a minimum of fourth grade 
reading skills and a minimum IQ of 90. 


Procedure 


The auditory response alternative lists were 
presented in the following manner. When the 
apparatus was activated the visual stimulus 
consonant unit was presented for 2 seconds. 
One second after its removal the first auditory 
response alternative was presented, consuming 
4 second. After a 2.6-seconds delay, the second 
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auditory response alternative was presented 
Following this, the experimenter stopped the 
machine, waited for the subject’s response, gave 
feedback, recorded the response, and activated 
the machine for the next set. In the visual re- 
sponse alternatives condition, the auditory 
stimulus word was presented for 4 second, 
followed by a 1-second delay before the pres- 
entation of the first visual response alternative 
for 2 seconds. This was followed, 1 second 
after its removal, by the second alternative for 
a 2-second exposure. After its removal the 
experimenter stopped the machine and pro- 
ceeded as in the auditory condition. Thus, in 
both conditions the time between onset of 
stimulus and termination of the exposure of the 
second response alternative was 6.5 seconds 
with a constant 1-second interval between the 
conclusion of the presentation of the stimulus 
item and the onset of the first response alterna- 
tive. Because of the desire to hold these two 
time intervals constant, and the differential 
temporal nature of auditory and visual ex- 
posure modes, the exposure times of stimulus 
and response items and the times between re- 
sponse alternatives were different for the ex- 
perimental conditions. 

Each subject was given two lists to learn, one 
on each of 2 consecutive days. The study was 
initially designed to have each subject serve as 
his own control, and the first 16 subjects were 
randomly assigned to Groups A,-V,, (the first 
two symbols—A,—refer to the first day’s re- 
sponse condition and list number, respectively ; 
the second two symbols—V,,—to the second 
day’s response condition and list number), A,,- 
V,, Vy-Ay, and V,,;-A;. When these subjects 
had been run, it became apparent that differ- 
ential transfer effects were confounding Day 2 
scores. For this reason, the second group of 
16 subjects was randomly assigned to Groups 
A,-Ay,, Ay-Ap Vy-V1 and V,,;-V;. Thus there 
were four subjects in each of the eight groups 


that could be composed by varying condition, 
list number, and order. 

Prior to the presentation of the first list, 
each subject was told what a consonant unit 
was, and that she was to learn to connect this 
with a common word. She was told that she 
would have no idea at the beginning which was 
the correct alternative and to guess, but to try 
and remember the correct one. The subject was 
asked to say “First” if she thought the first re- 
sponse alternative was the correct one, and 
“Second” if she thought it was the second one. 
The experimenter said “Right” or “Wrong.” 
The subjects were instructed to learn as quickly 
as possible and to guess when not sure. Cri- 
terion was errorless performance on 2 succes- 
sive trials, and the cycles of 4 trials (or 24 
sets) were repeated until criterion or the maxi- 
mum of 36 trials. Prior to learning the second 
list on Day 2, retention was tested by present- 
ing the subject with the printed individual con- 
sonant units in random order and asking for 
the common word associated with it. Guessing 
was encouraged. Retention of the list learned 
on Day 2 was tested on Day 3. A total of 23- 
24 hours separated each session, and the same 
experimenter ran all the subjects on all sessions. 


RESULTS 


The means of the learning and reten- 
tion scores by condition and day are 
shown in Table 1. Because there ap- 
peared to be overall heterogeneity of 
variance, parametric analysis was per- 
formed only on those data where hetero- 
geneity could not be accepted at the 
.O5 level by the Bartlett test. Since list 
number was not an important variable 
(Fs for list number on each day as well 
as those for its interaction with condition 


TABLE 1 


MEAN TRIALS TO CRITERION AND AMOUNT RETAINED AS A FUNCTION OF PRESENTATION Mont 


Mode Day 1 


Days 1 and 2 Trials SD Retention 
1.38 
2.63 
3.88 
4.80 


Note.—WN per entry = 8. 


Day 2 


SD | Trials SD Retention SD 
24.13 10.52 
17.88 7.59 
13.13 8.64 
10.50 4.64 
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on each day were < 1.00), it is not con- 
sidered. The retention scores appear to 
reflect rate of learning and whether or 
not criterion was reached, and are not 
treated separately, since six of the eight 
Spearman rho’s between trials-to-cri- 
terion and number of correct retention 
responses, calculated by taking each 
group and day separately, were negative. 
The differences attributed to display 
mode (condition) are highly significant 
on Day 1 (Mann-Whitney U=254.5, 
p < .001), with visual mode clearly su- 
perior. Of the 16 subjects in each Day 1 
group, 8 of the auditory subjects failed 
to reach criterion by Trial 36, while the 
slowest visual subject reached it by 
Trial 16. 

The summary of analysis of variance 
of Day 2 scores is shown in Table 2. 
These data are put in the form of the 
transfer paradigm (cf. Noble, 1961). 
The effect of display mode here is not by 
itself statistically significant; rather, it 
was affected by the first day’s condition 
and/or performance. The two groups 
which had the visual condition on Day 1 
and, hence, fewer trials-to-criterion on 
that day did better in the transfer (Day 
2) phase than those that had the audi- 
tory condition first. The significance of 
the interaction in Table 2 is attributable 
to the poor scores of the AA group (CR 
of difference at .01 level=10.89). The 


TABLE 2 


SUMMARY OF ANALYSIS OF VARIANCE OF Day 2 
Scores 


Source df MS 


Training (Day 1 con- 
dition constant) 
Transfer (Day 2 con- 
dition constant) 
Interaction 
Error (w) 
Total 


means of Day 1 and Day 2 scures irre- 
spective of groups—20.84 and 16.41, 
respectively—were significantly different 
at the .02 level (F=7.01, df=1/30), 
indicating an overall facilitation as a 
result of the first learning task. The 
effect of similarity of conditions across 
days, or its interaction with days, was 
not statistically significant (both 
Fs< 1.00, df=1/30). However, the cor- 
relation between Day 1 and Day 2 
learning scores for the subjects was 
positive for those groups having the 
same condition across days: rho= + .49 
and +.25 for Groups AA and VV, re- 
spectively; and negative for groups with 
different conditions: —.11 and —.41 


for Groups AV and VA, respectively. 


DISCUSSION AND CONCLUSIONS 


Two findings warrant attempts at ex- 
planation: the overwhelming superior- 
ity of the visual response condition on 
the Day 1, and rather complex interac- 
tion effects reflected in Day 2 scores. 
In respect to Day 2 scores, it is possible 
that the initial formulations of stimulus 
and response were inadequate. Under 
both conditions the “stimulus” word was 
absent at the onset of the response al- 
ternatives and the subject therefore has 
to retain the stimulus term while making 
a judgment. In the visual response con- 
dition the trace that had to be retained 
was a simple, common word; in the 
auditory condition it was an unpro- 
nounceable, low association trigram. It 
may be supposed that the trigram trace 
was less deeply imprinted and faded 
more quickly than that of the common 
word, and that, hence, the subject in the 
visual response condition, in making the 
judgment of the correct alternative, was 
responding in the presence of a stronger 
stimulus than the subject in the auditory 
response condition. In other words, 
under both conditions two responses had 
to be made, a retention response, and a 
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judgmental, or associating, response; 
and the first of these responses was 
harder to make and stood a lesser likeli- 
hood of being successful under the audi- 
tory condition. This is admittedly a 
post hoc explanation, but one which 
would be necessary to prevent an inter- 
pretation of the results which would run 
counter to the previously cited evidence 
on response meaningfulness and transfer. 
It is testable in that one would predict 
that the auditory response condition, 
with its greater response meaningful- 
ness, would be the superior one in a 
situation where the differential stimulus 
deterioration did not obtain (e.g., where 
there was presentation of the stimulus 
simultaneously with the response al- 
ternatives). 

The transfer effects were not origi- 
nally an item of primary interest, but 
rather the result of the self-control de- 
sign, and here again, though perhaps 
more justifiably, a post hoc explanation 
is attempted. There appears to be noth- 
ing about the two learning conditions 
themselves which would give differential 
transfer, other than their innate diffi- 
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VOCABULARIES 
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An experimental test of 40 items given to 257 3rd-, 6th-, and 9th-grade 
pupils revealed a definite decrease in choice of concrete definitions and 
increase in functional and abstract choices at each higher level, and 
especially between the 3rd and 6th grades. Significant differences were 
obtained between all categories except functional choices in the 6th and 
9th grades. When children were classified according to their dominant 
response on the vocabulary test (at least 40% of the choices in that area) 
it was found that there was a significant difference between the mixed 


and both the functional and abstract groups at 


confidence. 


the 1% level of 





The important fact about a child’s 
vocabulary may be, not the number of 
words he recognizes superficially, but 
the quality of his associations with dif- 
ferent words. We know something about 
a child’s vocabulary abilities when we 
test that he can recognize and give one 
meaning for all the words on his reading, 
his spelling, or his social studies lists. 
We know more about his abilities when 
we record the kinds of meanings he 
attaches to words like PARENT, SCHOOL, 
COMMUNITY, or even like COOPERATION, 
JUSTICE, and pDEMocRrACY. Most vocabu- 
lary studies, such as the classic ones by 
Thorndike, Horn, and Rinsland and the 
more recent one by Dale and Eichholz 
(1961) are quantitative studies involv- 
ing some form of frequency count. 
Qualitative studies of the kinds of mean- 
ings children acquire and use at various 
ages are also needed. As Nisbet (1960) 
suggested in a review of vocabulary stud- 
ies, “We seem to be moving away from 
the original idea of a massive general list 
of the Thorndike type toward the spe- 
cific inquiry whose results have direct 
application in a more limited sphere.” 
(p. 60) 

One place for extending vocabulary 
research is into the relatively unexplored 
area of the quality of children’s re- 
sponses to known words. These have 


been studied by Feifel and Lorge (1950) 
in terms of recall in the form of defini- 
tions given on the Stanford-Binet Test. 
They found that children aged 6-7 years 
gave definitions which could be described 
as including more use and description, 
and less explanation. Children aged 
10-14 years defined words in terms of 
“class” features or abstractions in their 
use of synonym and exploration type 
responses. In other words, in recall 
procedures between 9 and 15 years the 
character of the definition of word 
changes, perhaps because of the acquisi- 
tion of different types of understanding. 
Hurlburt (1954) has pointed out, how- 
ever, that recall and recognition tech- 
niques for measurement of precise 
knowledge of word meanings have a 
limited number of common factors and, 
therefore, that both techniques must be 
used. The present study enlarges the 
scope of the Feifel and Lorge research 
by investigating qualitative levels of re- 
sponse in the recognition situation, prob- 
ably much commoner in children’s ac- 
tivities such as reading than when they 
are asked to give an oral definition of an 
isolated word as in a Stanford-Binet test. 
It is concerned with the question: When 
a child has several alternative meanings 
before him which level of meaning, con- 
crete, functional, or abstract does he 
choose as the “best” meaning? 


170 
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This question was also investigated by 
Kruglov (1953) using a 10-word, multi- 
ple-choice recognition test in Grades 3, 
5, 7, and 8 and with college students. 
Kruglov found in the higher grades an 
increasing choice of synonyms and a de- 
creasing use of repetition-illustration- 
inferior explanation, but no significant 
differences between grade levels in use 
or description or explanation types of 
choices. The present investigation ex- 
plored children’s choices in categories 
different from those used by Kruglov. It 
grew out of a previous study (Russell, 
1954) which found some evidence that 
children’s breadth of vocabulary knowl- 
edge (multiple meanings) could be 
tapped more easily than their depth of 
vocabulary knowledge in terms of know- 
ing a great deal about one word. The 
categories used are also related to 


Piaget’s (1926) early language study. 
In analysis of his records of children’s 
oral language in an interview situation, 
he believed that responses could be de- 


scribed as autistic, egocentric, and logi- 
cal. Later Piaget (1929, 1930) listed 
17 explanations of these levels; and 
American research workers confirmed in 
part a broader developmental sequence 
in concepts which might be described 
as sensorimotor, concrete perceptual, 
functional, and abstract-conceptual lev- 
els of thinking (Russell, 1960). 


PROCEDURES 


The present study is concerned with the 
questions of whether and when the child per- 
ceives functional or abstract definitions as su- 
perior to concrete, particular definitions or ex- 
amples. A multiple-choice test of 40 words was 
constructed and the definitions checked with 
92% agreement by two judges as being (a) 
concrete, (b) functional, (c) abstract, and (d) 
incorrect. The words used ranged from some 
in the first Thorndike thousand to the sixteenth 
thousand, early items being STAR, COUNT, BRAVE, 
and the most difficult words including 1IN- 
TRIGUE, AMBIGUOUS, and CoERCION. Two of the 
items with the categorization of the answers 
were as follows: 


Count: to find the number of things in a 
group (functional), to find how many pennies 
are in your pocket (concrete), to say numbers 
in order—upward or downward (abstract), and 
to tell numbers one after the other (incorrect). 

Ambiguous: when a sentence has two mean- 
ings (concrete), difficult or hard to do (incor- 
rect), it takes many interpretations (func- 
tional), and when something cannot be ex- 
plained or placed definitely (abstract). 

In preparing the test the authors found it 
difficult always to distinguish between func- 
tional and abstract definitions. In general, 
functional was interpreted to mean: the func- 
tion the word performs or what we do when 
we execute or perform the word. An abstract 
definition was considered a general statement 
applying to a category or class and without 
reference to specific example or function. 

Aiter a preliminary trial, the test was given a 
second run with 257 children in three third- 
grade, three sixth-grade and three ninth-grade 
classes in Berkeley, California. The test was 
untimed with the test administrator reading 
orally each test item twice, once for the chil- 
dren to consider the whole item, the second 
time to mark “the one best meaning of the 
word.” Although the test needs some further 
revision the results of the first administration 
seemed of sufficient interest to be put into the 
present report. 


RESULTS 


Table 1 reveals some dominance of 
concrete and functional choices by the 
third-grade children with considerable 
decline of concrete choices in the sixth 
and ninth grades by these pupils. Corre- 
latively, the number of both functional 
and abstract choices increases in the 
sixth and the ninth grades. When the 
differences of the means of the three 
categories between sixth and third, ninth 
and third, and ninth and sixth grades, 
respectively, were checked by the ¢ test, 
they were found to be significant beyond 
the 1% level for all the pairs of scores 
in the table, with one exception. There 
was no significant difference in the 
scores on functional responses between 
the sixth and ninth grades. Accordingly, 
the results suggest that these sixth grad- 
ers and ninth graders both choose fewer 
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TABLE 1 


SIGNIFICANCE OF DIFFERENCE BETWEEN MEAN Scores REFLECTING Derinitions CHOSEN AS 
“Best” By 257 CHILDREN 


Grade 6 


Grade 3 


M 


12.81 
10.18 
9.16 
6.59 
1.24 


Concrete 
Fundamental 
Abstract 
Errors 
Omitted 
* Significant at .01 level, ¢ 2.76 

concrete definitions as “best” than do 
the third graders, and that the ninth 
graders choose slightly fewer concrete 
responses than the sixth graders choose. 
The table suggests that both the sixth 
graders and the ninth graders choose 
more functional and more abstract defi- 
nitions than the third graders do and 
that the ninth-grade pupils choose a few 
more abstract definitions than the sixth- 


grade pupils mark as “best.” Errors in 
definitions also decrease from grade to 
grade. In every case, for the test used, 
numerical differences of the means are 
greater between the third and sixth 
grades than between the sixth and ninth 
grades. 


Characteristics of the Experimental Test 
of Level of Meaning 


Some of the most interesting data of 
the present study come from an analysis 
of the test itself. Some of the usual 
checks applied in item analysis, in a 
test of reliability by split-halves, and in 
external validation do not seem to apply 
to a test which has more than one “cor- 
rect” answer and which attempts to get 
at qualitative differences. For example, 
item analysis revealed that certain words 
apparently influenced the children in the 
direction of abstract meanings while 
others did not. For the third, sixth, and 


Grade 9 t values 


Grades 6 
and 9 


Grades 3 
and 6 


4.60* 
1.43* 
4.65* 


18.91* 
8.85* 
9.76* 


ninth grades the percentages of pupils 
checking the abstract meaning of Ex- 
PERIENCE (Item 14) were 28, 56, and 
74%, respectively; but the percentages 
checking the functional definition were 
23, 28, and 21%. On the other hand, 
the abstract definitions of FARMER (Item 
7). were 18, 43, and 47% and the num- 
ber checking the functional definitions 
were 20, 36, and 47%. In other words, 
EXPERIENCE may lend itself to abstract 
meanings, and FARMER to functional 
ones, even at the ninth grade. As sug- 
gested above, the constructors of the test 
had more difficulty in distinguishing be- 
tween functional and abstract definitions 
than between any other type of pairing. 

Because of the nature of the test, a 
check on reliability using split-halves 
has no meaning. However, one measure 
of reliability was obtained by repeating 
the test after approximately 4 months 
in three sixth-grade classes. Using the 
Kendall (1955) rank correlation coeffi- 
cient 7, it was found that the agreement 
between the test-retest in concrete, func- 
tional, and abstract meanings in the 
three sixth-grade classes were .62, .79, 
and .82, respectively. If the test has 
validity, however, some change in items 
from concrete to functional, or from 
functional to abstract might be expected 
over 4 months’ time. For these sixth- 
grade classes, the changes from concrete 





QUALITATIVE LEVELS IN VOCABULARIES 


TABLE 2 


SIGNIFICANCE OF DIFFERENCE BETWEEN MEANS OF VocaABULARY ScoRES Mape sy Srxtn GRADE 
CHILDREN AFTER CATEGORIZING THEIR CHoIces AS ABSTRACT, FUNCTIONAL, AND MIXED 


Choice 


Functional 27 
Abstract 12 
Mixed 16 


66.48 
61.2 


39.9 
* Significant at .01 level, Cochran and Cox test 


to functional occurred in 16 of the items 
and from functional to abstract in 21. 

A further check on validity was ob- 
tained by correlating results on the ex- 
perimental test with the children’s scores 
on a standardized vocabulary test. For 
this purpose, choices of abstract, func- 
tional, concrete, and error items were 
arbitrarily rated 3, 2, 1, and 0. For 78 
cases the total score thus computed 
correlated with the standard vocabulary 
score with an r of .70. 

A third check on validity was obtained 
through classifying students on the 
above mentioned vocabulary test on the 
basis of their dominant preference on 
the experimental test. Each pupil’s 
dominant preference for any one cate- 
gory of meaning was identified arbi- 
trarily on the basis of at least 40% of 
the answers in that category and at least 
10% less on any other category. Those 
who failed to meet the above arbitrary 
criterion were put in a “mixed category.”’ 
A comparison among the means of the 
vocabulary scores in the three categories 
using Cochran and Cox (Edwards, 
1950) approximation method of ¢ test 
is shown in Table 2. 

These results suggest that children in 
the sixth grade may be categorized (with 
some certainty) as selecting meanings 
either in abstract or mixed and func- 
tional or mixed groups but that the 


t values 


Obtained in Obtained in 
comparing comparing 
with with 
Abstract Functional 


present test failed to distinguished be- 
tween children who select meanings at 
abstract or functional levels. 


CONCLUSIONS 


Children’s vocabulary abilities should 
probably be scored for breadth and 
depth of meanings and level of definition 
selected as “best,”’ as well as by purely 
quantitative measures. The present at- 
tempt to study quality of understanding 
through recognition abilities differs from 
measurement of recall abilities and may 
be important in the process of compre- 
hension during reading. A test of quality 
of definitions chosen, however, cannot 
be checked easily by the usual measures 
of reliability and validity and so other 
ways of doing this are described above. 
One of the chief limitations of the pres- 
ent procedure is the heavy reliance on 
the form of response and it may be that 
differences obtained were a function of 
the wording of the responses rather than 
the nature of the word or its definition. 
For instance, in an example given above, 
the word INTERPRETATIONS may be a 
difficult word for third graders and may 
steer them away from a_ functional 
choice. Furthermore, a test of qualita- 
tive responses may be considered as 
much a test of mental ability as a test 
of vocabulary. If it seems desirable to 
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study children’s vocabularies in other 
than quantitative terms, the writers are 
of the opinion that tests such as an im- 


proved version of the one used in this 
study may yield useful data about both 
intelligence and vocabulary development. 
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SET APPLIED TO STUDENT TEACHING 
M. C. WITTROCK 
University of California, Los Angeles 


To determine if a set to teach for pupil gain influences learning, student 
teachers in the experimental group were told that their final course grades 
in educational psychology and in student teaching depended upon the 
amount of pupil gain. Differences between pretests and posttests for 787 
secondary school students of English, American government and history, 
and social studies indicated that the experimental group of teachers 
produced more (p < .001, 2-tailed) student gain than did a control 


group of student teachers 
scores adjusted for 
(p < .05). 
in teaching 


An analysis of covariance of the posttest 
the pretest scores supported the above result 
It was concluded that the concept of set finds application 





A set is assumed to increase the prob- 
ability of the occurrence of certain re- 
sponses and to decrease the probability 
of occurrence of other responses usually 
through selecting, directing, or organiz- 
ing some part of experience. The various 
uses of the term “set” range from motor 
set to goal set. Included between these 
two extremes are, among others, learning 
set, response set, task set, and methods 
of problem solving aroused by directions 


or by problem situations. For reviews of 
the concept see Young (1961, pp. 264— 
278) and Gibson (1941). 

There has been much research in psy- 
chology gn set, but comparatively little 
attention has been given to it in educa- 


tion. This is surprising since school 
teachers dispense verbal instructions and 
give other sets to students almost daily. 
One method of developing a set in sub- 
jects is by appropriately reinforcing trial 
and error situations (Harlow, 1949) to 
allow the subjects to “discover” when 
certain sets are appropriate. Another 
method of developing sets in subjects is 
simply to use some form of “reception 
learning,” e.g., verbal instructions 


1 The author wishes to thank the following 
Jniversity of California, Los Angeles, pro- 
fessors for their important roles in the design 
and execution of this study: J. A. Bond, T. R. 
Husek, E. R. Keislar, D. A. Leton, J. D. 
McNeil, and A. G. Sorenson. 
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(Wittrock, in press). The present study 
employed the latter technique. As used 
in this study, set referred to the tem- 
porary influence upon the behavior of 
the student teachers, which was pro- 
duced by certain verbal statements de- 
signed to make explicit a particular 
teaching objective. 

In this study reinforcement, i.e., final 
course grades for the student teachers 
in the experimental group, was made 
contingent upon the gain that their sec- 
ondary school students evidenced on 
standardized achievement tests given as 
pretests and posttests. 

It is hypothesized that an explicit set 
to teach for pupil gain on a standardized 
achievement test results in a change in 
the teachers’ behavior, which produces 
greater pupil gain than does a compa- 
rable procedure that does not specify 
pupil gain as the criterion. 


METHOD 


Subjects 

The experimental and control groups of 14 
student teachers each were divided by subject 
matter as follows: English—three, American 
government and history—four, and social 
studies—seven. There were seven women and 
seven men in the experimental group. Nine of 
the experimental group were teaching in their 
major field, and five of the group were teach- 
ing in their minor field. The control group 
consisted of eight women and six men: 10 were 
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teaching in their major field, and 4 were 
teaching in their minor field. Five factors were 
used to match the student teachers individ- 
ually: teaching major, sex, age, a measure of 
ability (Cooperative Test Division, 1950), and 
the public school in which the student teaching 
was performed. All of the student teachers 
were regularly enrolled in the University of 
California, Los Angeles, teacher education pro- 
gram. 

A total of 787 junior high and senior high 
school students were enrolled in the classes 
taught by the 28 student teachers mentioned 
above. The 395 students of the experimental 
group were divided as follows: students in 
social studies—215, students in American gov- 
ernment and history—99, and students in Eng- 
lish—81. In the control group there were 392 
students of which 199 were enrolled in social 
studies, 106 in American government and his- 
tory, and 87 in English. The students in so- 
cial studies were enrolled in four junior high 
schools, and the students in American govern- 
ment and history and in English were enrolled 
in one senior high school. All five of the 
schools lie in the area commonly referred to as 
West Los Angeles. None of the above classes 
were homogeneously grouped according to abil- 
ity or achievement. 


Materials 


In the social studies classes the Cooperative 
Social Studies Test, Grades 7, 8, and 9 (Lloyd, 
1948), Parts II and III, were used for both 
pretests and posttests. In the area of American 
government, the Cooperative American Gov- 
ernment Test (Haefner, 1947) was used for 
both pretests and posttests. In the American 
history classes the Cooperative American His- 
ory Test (Berg, 1948) was used. In all three of 
the above areas Form X of the appropriate test 
was used for the pretest, and Form Y of the 
appropriate test was used as the posttest. In 
the English classes the Cooperative English 
Tests, Parts I and II, which together com- 
prised the Test of English Expression (Co- 
operative Test Division, 1960), were used. 
Form 2A was used as the pretest, and Form 2B 
was used as the posttest. The measure of affec- 
tivity toward the subject matter consisted of a 
five-point scale ranging from one, like very 
much, to three, do not like or dislike, to five, 
dislike very much. 


Procedure 


Three weeks before the beginning of the 
spring semester, 1961, the college supervisors 
of the student teachers were informed that an 
experiment in student teaching and in educa- 


tional psychology was planned for the next 
semester. The cooperating, i.e., training, teach- 
ers of the student teachers in the experimental 
group were each told shortly after the begin- 
ning of the spring semester that the student 
teacher was participating in an experiment and 
that he was to be graded in student teaching on 
the basis of his pupils’ gain. They were told 
further that a pretest and a posttest would be 
given during the semester to the secondary 
school students and that these tests would be 
objective, standardized tests over the appro- 
priate subject matter. They were not told 
which standardized tests were to be used. The 
training teachers of the student teachers in the 
control group received a letter with the same 
information as above except that no mention 
was made of a grading procedure for student 
teachers. All of the training teachers were in- 
formed in these initial letters that they would 
be given each secondary school student’s results 
of each of the tests and that the experiment 
would be explained in detail to them at the end 
of the semester. 

At the first meeting of the class in educa- 
tional psychology, all of the student teachers in 
the areas of social studies, American govern- 
ment and history, and English were informed 
that they were to be enrolled in a special edu- 
cational psychology section and that they were 
to be part of an experiment on student teach- 
ing and educational psychology. All these 
teachers were required to become part of the 
experiment. Two days later, on the first day 
of the special section of educational psychol- 
ogy, the student teachers were told that their 
final course grades in educational psychology 
and in student teaching would be determined 
by the amount of improvement in performance 
on a standardized achievement test, which they 
could obtain with their students, as compared 
with the average amount of improvement of a 
group of secondary school students taught by 
student teachers who had been matched to 
them individually. In order to insure that this 
set to teach for pupil gain continued to exist in 
the students, and, of course, to teach educa- 
tional psychology, the special class continued 
to meet for 2 hours each week throughout the 
semester. The “regular” sections of educational 
psychology also met 2 hours each week. The 
content of the courses was parallel. The author 
taught both the regular and the special sections 
of the course. At no time were the student 
teachers coached regarding pupil gain or how 
to teach for it. The student teachers were not 
allowed to see which tests had been used or 
which tests were to be used in their classrooms ; 
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they knew only that standardized achievement 
tests were being used. 

During the second week of the semester, 
letters were sent to each of the training teach- 
ers to the effect that a pretest would be given 
in their classrooms at a day convenient to them 
sometime during the fourth week of the semes- 
ter. The teachers were all told to inform their 
secondary school students that a standardized 
achievement test would be given in their classes, 
that this test was a part of an experiment, and 
that their scores on these tests would not be 
used in any way to jeopardize their possibilities 
of entering college. The students were also told 
that their scores on these tests would be made 
available to their teachers. During the fourth 
week of the semester, the pretest was given to 
the classes of the experimental and control 
groups. In each class the experimenter in- 
troduced himself as a representative of the Uni- 
versity of California and reiterated the fact 
that the scores would not be used in any way 
to influence the students’ possibilities of enter- 
ing college but that the scores would be made 
available to the students’ teachers. The experi- 
menter then read aloud the standardized direc- 
tions from the test manual, and the students 
were given the time alloted in the standardized 
directions to complete the examinations. The 


examinations were all scored by machine ac- 
cording to the directions given in the test 


manual. 

A posttest was given 2 weeks prior to the 
end of the semester. All training teachers and 
student teachers were again notified in advance 
about the examination in a manner identical to 
the manner indicated above. The procedure 
that was followed on the test day was identical 
to the procedure outlined above with one ex- 
ception. The students were asked to rate on a 
five-point scale ranging from “like very much” 
to “dislike very much” their attitudes toward 
the subject matter which they had studied that 
semester. 
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After the experiment was completed all train- 
ing teachers and college supervisors were in- 
formed by letter of the nature and results of 
the experiment. At the end of the semester the 
student teachers were graded in educational 
psychology solely on the basis of pupil gain as 
compared with the control group teachers’ 
pupil gain. In student teaching, pupil gain was 
combined with the ratings of the supervising 
teachers and of the training teachers to obtain 
the final student teaching grade. In no case did 
the pupil gain score result in a lowering of any 
student teacher’s final grade. 


RESULTS 


Table 1 presents the pretest data. The 
t test between the means of the respec- 
tive experimental and control subgroups 
indicated that there was no significant 
difference in initial performance between 
them (p > .05, two-tailed). 

The difference scores in Table 2 were 
computed by subtracting each student’s 
pretest score from his posttest score. 
Homogeneity of variance tests (Lind- 
quist, 1953, p. 40) for all the compari- 
sons listed in Table 2 produced no sta- 
tistical reason (p > .05) to discredit the 
assumption of homogeneity of variance. 
From Table 2, the difference between 
the means of the experimental and con- 
trol groups was in favor of the experi- 
mental group as predicted (p < .001, 
two-tailed). The English experimental 
and control groups’ means differed sig- 
nificantly (p < .001, two-tailed) in favor 
of the experimental group. The social 
studies experimental and control groups’ 


TABLE 1 


A COMPARISON OF THE EXPERIMENTAL AND Controt Groups’ Prerest Scores 


Groups 
Social studies, experimental 
Social studies, control 
Government and history, experimental 
Government and history, control 
English, experimental 
English, control 
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TABLE 2 


DIFFERENCE SCORES BETWEEN PRETEST AND POSTTEST FOR THE EXPERIMENTAL AND CONTROL 
Groups 


Groups 


Entire experimental group 

Entire control group 

10.08 
8.69 
5.79 
4.22 
3.68 

—0.27 
7.49 
5.51 


Social studies, experimental 

Social studies, control 

Government and history, experimental 
Government and history, control 
English, experimental 

English, control 

Entire experimental ® 

Entire control @ 


*N = 14. See Results section of text. 
* Significant at .05 level. 
** Significant at .02 level. 
**** Significant at .001 level. 


means differed also (p < .05, two-tailed) 
in favor of the experimental group. The 
difference between the history and gov- 
ernment experimental and control groups’ 
means was not statistically significant at 
the .05 level. However, the difference 
was in the predicted direction. 

The above ¢ tests between the experi- 
mental and the control groups were ap- 
propriate to test the hypothesis men- 
tioned in the introduction. However, a 
more rigorous criterion of whether or 
not a difference existed between the 
experimental and control group of teach- 
ers was obtained as follows: by using 
only the mean of the difference scores 
for all secondary school students in each 
student teacher’s class to measure each 
student teacher’s teaching performance, 
the experimental group of teachers was 
compared with the control group of 
teachers (see Table 2). According to 
this procedure, the teachers of the ex- 
perimental group averaged 7.49 points 
gain between the pretest and the post- 
test. The control teachers averaged 5.51 
points gain between these same two tests. 
By use of the standard error of the dif- 
ference formula for individually matched 


af 


groups, a ¢ of 2.89 (p < .02, two-tailed) 
was found. 

The posttest mean scores for the 14 
experimental classes and the 14 control 
classes were compared by analysis of 
covariance. The posttest mean scores 
were adjusted for the pretest mean 
scores. The resulting F was 4.40, 
df=1/25, p < .05.? 

Table 3 presents the analysis of the 
affectivity ratings of the secondary 
school students toward the subject mat- 
ter they studied during the semester. 
No significant difference was found 
(p > .05, two-tailed) between the entire 
experimental and entire control groups. 
However, when subgroup means were 
compared, the English students in the 
experimental group scored significantly 


2 However, from an inspection of graphs of 
these scores and the difference scores, it is 
questionable whether or not the experimental 
group and control group were each normally 
distributed. From the difference scores, a sign 
test was computed between the matched pairs 
in the experimental group and in the control 
group. Of the 14 matched pairs, 12 of the 
differences are positive and 2 of the differences 
are negative. This is statistically significant at 
the .01 level. 
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TABLE 3 


Arrectiviry Towarp Supyect MATTER FOR THE EXPERIMENTAL AND CONTROI 


Groups 


Entire experimental group 
Entire control group 


Social studies, experimental 

Social studies, control 

Government and history, experimental 2.61 
Government and history, control 2.62 
English, experimental 3.39 
English, control 81 


Note.—Possible scores ranged from 1 
*** Significant at .01 level, two-tailed 


lower (p < .01, two-tailed) than did the 
English students in the control group. 


DISCUSSION 


A statistically significant difference 
between the entire experimental and con- 
trol groups appeared when the achieve- 
ment test data were analyzed by any of 
the statistical procedures mentioned 
above. The interpretation of the above 
results must first include a consideration 
of some limitations that were inherent 
in the study. 

The student teachers in the experi- 
mental group were taught educational 
psychology in a small group situation 
rather than in a large lecture situation 
such as was used with the members of 
the control group. There is, therefore, a 
possibility that the Hawthorne effect 
(Roethlisberger & Dickson, 1939) was 
in operation in this study and that sim- 
ply giving personal attention to the 
members of the experimental group may 
have facilitated their performance. In- 
tact groups were used in this study. 
However, experimental and _ control 
groups each contained samples of each 
of the same five schools in operation. 
Further, when one compares the experi- 
mental group of 14 student teachers per 
se with the control group of 14 student 


GROUPS 
SD 

10 

13 

14 

11 


1.11 
1.11 


1.19 
1.17 


(like very much) to 5 (dislike very much) 


teachers, use of intact groups is some- 
what more defensible, because each in- 
tact group then provided only one meas- 
ure for the entire group instead of a 
number of measures equal to the N of 
the group. Standardized achievement 
tests were used as the bases for measur- 
ing the results of the experiment. The 
disadvantages of using standardized 
achievement tests as measures of class- 
room learning are well known (Thorn- 
dike & Hagen, 1961, pp. 310-314). Two 
of the more important disadvantages of 
standardized tests are as follows: (a) 
the test may well lack validity and be 
insensitive to some important educa- 
tional process and product outcomes of 
classroom teaching,* and (4) use of 
standardized tests as measures of teach- 
ing outcomes may imply that the stand- 
ardized test is a measure of good teach- 
ing or of teaching effectiveness. 

To measure the effects of the set used 
in this study, well-constructed stand- 
ardized achievement tests were appro- 
priate product criteria. The second of 
the above criticisms does not apply to 
this study because no attempt is made 
here to state that improvement on a 


8 For a discussion of process, product, and 
presage goals of teacher education, see pages 
1482-1486 of the Encyclopedia of Educational 
Research (Mitzel, 1960). 





180 M. C, Witrrock 


standardized achievement test is evi- 
dence of “good teaching.” An evaluation 
of teaching must include many complex 
learning processes and learning products. 
It is suggested here that teaching can 
be studied separately from the value 
judgment regarding which directions 
classroom learning should pursue. 

Another limitation is that the sec- 
ondary school students of the study 
represented a middle class socioeconomic 
level. All five schools of this study were 
located in West Los Angeles. 

With the previous limitations in mind, 
the results of this study are interpreted 
to provide evidence that the concept of 
set can be of use in the teaching of social 
science material to middle class second- 
ary school students. 

How this set may have affected be- 
havior presents an interesting problem. 
The teachers in the experimental group 
were given some motivation and direc- 
tion not made available to the members 
of the control group. The direction, 


namely to teach for pupil gain, was rele- 
vant to the criterion measure that was 
used in this study. The set probably 
contained a cue factor. The cue followed 
from making explicit the objective to be 
obtained by the end of the semester. 


The objective probably conveyed in- 
formation about the kinds of subject 
matter which would be sampled on the 
pretest and on the posttest. 

The results of this study are con- 
sistent with the operant conditioning 
paradigm mentioned in the introduction 
in that the student’s behavior was 
shaped by making reinforcement con- 
tingent upon certain responses. Cer- 
tainly, reinforcement for the student 
teachers was contingent upon their gain- 
ing pupil improvement on a standardized 
achievement test. The student teachers 
apparently “shaped” the behavior of 
their students without developing a dis- 
like for the subject matter of American 
government, history, and social studies. 
However, the English teachers in this 
study, who achieved the highest pupil 
gain of any of the three subgroups of 
student teachers of the experimental 
group, also produced a negative effect 
upon their students’ attitudes toward the 
subject matter of English. The study 
implies that by making goals evident 
and explicit, rather than vague and 
unverbalized, the behavior of teachers 
can be changed, but sometimes with un- 
desirable effects upon the attitudes of 
the students. 
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THE EFFECTS OF LETTER-NAME KNOWLEDGE 
ON LEARNING TO READ A WORD LIST IN 
KINDERGARTEN CHILDREN '* 

SIEGMAR MUEHL 
lowa Child Welfare Research Station 


The effects of learning letter names on the subsequent acquisition of 
word-name associations was investigated utilizing 87 kindergarten Ss 


> 


divided into 


groups on the basis of their pretraining experience 


A “relevant group” learned names for 3 letters that subsequently ap- 
peared as the critical stimuli in 3 nonsense words that were paired with 
pictures of familiar objects for paired-associate presentation. An “irrele- 
vant group,” learned names for letters that did not appear in the words 
on the transfer task. Knowledge of relevant letter names produced inter- 
ference in the word-naming task, a finding which was interpreted in rela- 
tion to evidence from classroom reading research. 


Research indicates that kindergarten 
children learn to discriminate words and 
to associate word names on the basis 
of details related to unique letter shape, 
letter position in the word, or familiarity 
with stimulus aspects of the letter sym- 
bols (Gates & Boeker, 1923; Meek, 
1925: Muehl, 1960, 1961). When chil- 
dren in these studies were asked how 


they recognized a word, their responses 
indicated that they apparently learned 
to associate a familiar verbal label with 
some aspect of the word configuration; 


e.g., the “dot” over the letter i, the 
“cross” on the letter ¢. 

If this verbal labeling process is the 
basis for mediating word discrimination 
and name association, then providing 
children with a consistent set of labels 
in the form of letter names should facili- 
tate this discrimination and association 
process rather than leaving the child to 
evolve his own set of labels. Support 
for this assumption is found in a series 
of studies summarized by Durrell 
(1958). He reported that knowledge of 


1 The author wishes to thank Buford W. 
Garner, Superintendent of Schools, Iowa City; 
Charles Railsback and Donald Tvedt, Prin- 
cipals; and Rae Blanchard, Dorothy Brejcha, 
and Audrey Nelson, kindergarten teachers 
whose cooperation made this research possible. 
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letter names was the best predictor of 
subsequent word recognition and read- 
ing performance in first grade children. 

Two areas of experimental research in 
psychology relate to this verbal labeling 
assumption. Stimulus pretraining stud- 
ies with preschool children have shown 
that the acquisition of distinctive names 
for similar visual stimuli produce learn- 
ing facilitation on a transfer task in 
which the same stimuli were associated 
with nonverbal responses (Cantor, 1955; 
Norcross & Spiker, 1957). A study in 
associative transfer, using the same 
A-B . . . A-C paradigm as in the stimu- 
lus pretraining studies, showed an inter- 
ference effect when the visual, or A, 
stimuli were relatively dissimilar and the 
first and second task responses were both 
verbal (Spiker, 1960). 

The assumption that letter-name 
knowledge would facilitate subsequent 
word discrimination and name asso- 
ciation is not consistently supported by 
the above findings. Letter-name learn- 
ing involves associating verbal labels 
with visual stimuli. Subsequent meaning 
association involves relating a different 
set of verbal labels with stimuli similar 
to, but not identical with, the letters 
themselves, i.e., words containing the 
letters. These two tasks represent an 
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A-B ...A‘-C paradigm. Facilitation 
should result in the A’-C task if the 
stimuli produced by responding with the 
distinctive letter names increases the 
distinctiveness of the words and/or 
mediates the second-task response. In- 
terference should result if responding 
with the letter names in the A’-C task 
competes with responding with the word 
meanings. Gibson’s (1941) study with 
adults using an A-B ... A’-C para- 
digm with verbal responses in both tasks 
reported interference in the A’-C task. 

The purpose of the present study was 
to assess these alternative assumptions 
by investigating the effects of pretrain- 
ing in the acquisition of letter names 
on learning a subsequent word list in 
kindergarten children. 


METHOD 


Pretraining 


Two groups of kindergarten children were 
differentiated by the type of letter-name pre- 
training each received prior to learning the 
word list or reading task. The pretraining 
paradigm is illustrated by Replication 1 in 
Table 1. The letters were typed in lower case 
on a primary-style typewriter and mounted on 
2” < 3” white cards. Two packs of cards were 
formed, one for.each of the groups. The packs 
were the same except for the letters. Each pack 


TABLE 1 


PARADIGM TO ILLUSTRATE THE EXPERIMENTAI 
DeEsIGN 


Letter-name pre- 


training groups Reading Task 


Replication 1 

(Pictures) 
yfl —boat 
yml—sled 
vel —cake 


Relevant f, m, g 
Irrevelant j, u, d 


Replication 
(Pictures) 
Relevant j, u, d yj! —sled 
yvul—boat 


Irrelevant f, ydl—cake 


contained three familiarization and nine prac- 
tice cards. The familiarization cards displayed 
each of the three letters separately. The prac- 
tice cards displayed two of the letters mounted 
at the opposite ends of the long dimension of 
the card. The three letters were presented in all 
possible pairings (including same letter pair- 
ings) giving a total of nine cards. The letters on 
the practice cards were presented in pairs to 
facilitate the letter-name learning. Pretraining 
consisted of three stages: (a) making the letter 
names available to the subject, (6) familiariz- 
ing the subject with the letter symbols and the 
names, and (c) practice in learning to associate 
the letter symbols and names. In the first 
stage the subject was instructed to repeat the 
letter names after the experimenter. After two 
repetitions the subject was asked to recall the 
names. If the subject could not, he was asked 
to repeat the letter names once more. In the 
second stage the experimenter gave two trials 
with the three familiarization cards, showing 
and naming each letter symbol and asking the 
subject to repeat the name. In the third stage 
the subject tried to name each of the letters 
which appeared on a practice card as the ex- 
perimenter pointed to them. The subject won 
the card if he could name both letters correctly. 
The experimenter named each letter if the sub- 
ject failed to respond, responded incorrectly, or 
correctly. The subject’s object was to see how 
many cards he could win with each run 
through the pack. Once through the pack con- 
stituted a trial. The cards were shuffled at the 
beginning of each trial. Pretraining continued 
until the subject had won seven of nine cards 
in two consecutive trials, or for a maximum of 
seven trials. Subjects reaching the seven of 
nine criterion were assigned to a Criterion sub- 
group; subjects not reaching the criterion were 
assigned to a Noncriterion subgroup. 

Despite efforts to equate the difficulty of the 
letter-naming tasks, learning the letter names 
for f, m, and g appeared to be more difficult 
than learning j, u, and d. In order to avoid 
bias that might occur as a result of this ap- 
parent difference in task difficulty, a counter- 
balancing replication was run. The pretraining 
paradigm for Replication 2 is shown in Table 1. 


Reading Task 


The reading task paradigm for Replications 1 
and 2 is shown in Table 1. In each replication 
the letter stimuli critical for discriminating 
among the nonsense words were the same let- 
ters used in the relevant letter-name pretrain- 
ing. The three nonsense words in each replica- 
tion were paired with colored pictures of 
common objects as indicated in Table 1. The 
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words were typed in primary-style, lower case, 
type and mounted with the pictures on white 
plastic cards for paired-associate presentation 
using a Hunter Card Master. This apparatus 
has been described in detail in an earlier study 
(Muehl, 1961). Four orders of the three word- 
picture pairs appeared in the reading task. Two 
of these orders presented consecutively consti- 
tuted a two-trial block. A blank card appeared 
at the end of a two-trial block. The following 
presentation intervals were used: a 4-second 
anticipation interval when the word at the left 
of the card was exposed; a 3-second joint- 
presentation interval when both the word and 
the picture were exposed; and a 2-second be- 
tween-item interval when the card aperture was 
covered. 

The reading task was administered im- 
mediately foliowing pretraining. The subject 
was told that this was a game called “name- 
the-picture.” He was first shown each of the 
three pictures with the word side of the aper- 
ture covered, then was asked to name each 
picture. Next the subject was given a familiar- 
ization trial with the word-picture pairs, was 
told to look carefully at each word, and, when 
the picture appeared, was instructed that the 
word went with the picture. This instruction 
was repeated with the experimenter pointing 
first to the word and then to the picture. Upon 
completion of the familiarization trial the ex- 
perimenter told the subject that the game was 
to look at the words and try to name the pic- 
ture before the picture appeared. Finally, the 
subject was asked to recall the picture names 

If the subject made a correct response during 
the anticipation interval, he moved a bead 
across a counting frame located in front of the 
Card Master below the display window. If the 
subject made an incorrect response, or failed to 
respond, the experimenter named the correct 
picture during the joint-presentation interval 
If the subject failed to respond during the an- 
ticipation interval, the experimenter pointed at 
the next word presented and said, “Guess the 
picture.” When the blank card appeared be- 
tween two trial blocks, the experimenter re- 
minded the subject to look carefully at the 
words, and try to name a picture each time a 
word appeared. 

All subjects received 16 trials on the reading 
task. The subjects’ responses during an an- 
ticipation interval were scored either as correct, 
as errors, or if no response was made, as omis- 
sions. 


Subjects 


The subjects were 87 children from four 
kindergarten classes in the public schools of 


Iowa City. They were assigned in a rotating 
order to the Relevant and Irrelevant pretrain- 
ing groups. Forty-one subjects were assigned 
in this fashion in Replication 1; 46 in Replica- 
tion 2. Two additional subjects were elimi- 
nated for failing to follow instructions in the 
reading task. The mean age was 67.1 months 
with a standard deviation of 3.7 months. There 
were no significant age differences between the 
replications or between the Relevant and Ir- 
relevant groups within the replications. Test- 
ing was done in the public schools during No- 
vember and December. Testing time ranged 
from 15 to 25 minutes. 


RESULTS 


Pretraining 


Comparing subjects pretrained with 
letter names f, m, and g with those pre- 
trained with j, u, and d in both replica- 
tions, 19 and 27, respectively, reached 
the pretraining criterion; 27 and 14, 
respectively, did not reach the criterion. 
A chi square test caused rejection of the 
hypothesis of independence (p < .025), 
indicating greater difficulty in learning 
f, m, g letter names on the basis of sam- 
ple proportions. Table 2 summarizes the 
pretraining results for the combined 
replications. Comparisons between Rele- 
vant and Irrelevant group means for the 
Criterion and Noncriterion subgroups 
showed no reliable differences. 


TABLE 2 


PRETRAINING RESULTS FOR THE COMBINED 
Repiications (CRs) 


7 Pretraining groups 
Criterion 


subgroups 
Irrelevant 


Relevant 
Criterion 

M (Trials) 

SD 

N 
Noncriterion 

M (CRs) 

SD 

N 
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Reading Task 


Comparison of the correct response 
and omission data in the two replications 
showed no reliable differences or inter- 
actions between the replications. The 
following analyses, therefore, are based 
on the combined replications. 

Table 3 summarizes the correct re- 
sponse data. Hartley’s test (Walker & 
Lev, 1953) for homogeneity of variance 
for the overall performance between the 
combined Relevant and combined Irrele- 
vant groups was significant (F=2.11; 
p < .05), indicating greater variability 
on the correct response measure for the 
Irrelevant groups. Comparing the score 
ranges (Relevant, 8-30; Irrelevant, 8— 
41) and distributions for the groups 
revealed similarity except for the higher 
scores. Only one Relevant subject scored 
30; seven Irrelevant subjects scored 30 
or above. 

A Lindquist (1953) Type III analysis 
of variance of the correct response data 
in Table 3 over trials indicated no sig- 
nificant interaction effects among the 
variables. The mean differences between 
Relevant versus Irrelevant, and Cri- 


TABLE 3 


SUMMARY OF CORRECT RESPONSE AND 
Omissions DATA ON THE 
READING TASK 


Correct 
Pretraining responses Omissions 
groups N M SD M SD 


Relevant 
i} 3 19.09 


4.94 
N-C 17.81 
Combined 
Relevant 
Irrelevant 
3 7 22.61 8.20 
N-C 19.20 7.30 
Combined 
Irrelevant 43 


18.48 


21.02 7.99 


* One subject dropped to obtain proportionality in 
analysis of variance. 


terion (C) versus Noncriterion (N-C) 
pretraining groups were not significant 
(p=.025 to adjust for effects of hetero- 
geneity of variance). The trials or learn- 
ing effect was significant (p < .001). 
Table 3 summarizes the omissions 
data. A Lindquist (1953) Type III 
analysis of variance of these data over 
trials showed none of the interaction 
effects to be significant. The mean dif- 
ferences between the Relevant versus 
Irrelevant, and the Criterion (C) versus 
Noncriterion (N-C) pretraining groups 
were significant (p < .05), as was the 
trials or learning effect (p < .001). 


DISCUSSION 


The results of the present study sup- 
port the assumption that the acquisition 
of letter names by kindergarten-aged 
children interferes with subsequent per- 
formance in learning to associate picture 
names with nonsense words containing 
these same letters as the critical stimuli. 
This interference effect was reflected in 
the significantly reduced variability and 
restricted range of higher scores in the 
correct-response measure for groups re- 
ceiving the relevant letter-name pre- 
training. These same Relevant groups 
tended to make fewer correct responses 
than the Irrelevant groups; the differ- 
ence, however, did not reach significance. 
Evidence for interference was also found 
in the omissions measure. Significantly 
more omissions were made by the Rele- 
vant groups over all trials. These find- 
ings are consistent with the results of 
associative transfer studies (Gibson, 
1941; Spiker, 1960). 

Observation of the children’s perform- 
ance on the reading task provided a clue 
to explain the frequency of omissions. 
A few children in the Relevant groups 
responded, overtly, with the letter 
names, rather than the picture names, 
to the words on the reading task. This 
typically occurred early in learning when 





Letrer-NAME KNOWLEDGE AND LEARNING OF Worps 185 


the majority of children in the Relevant 
groups were responding with omissions. 
Presumably most subjects quickly 
learned that the letter-name response 
was inappropriate to the words. How- 
ever, since the letter-name was the domi- 
nant response to the critical stimulus 
element in the words, this response had 
to be displaced by the acquisition of the 
appropriate word-name association. The 
omissions resulted, therefore, from com- 
petitive blocking (Gibson, 1941) of the 
picture-names by the more dominant 
letter-name responses. 

These findings require some interpre- 
tation in relation to the earlier assump- 
tion that verbal labeling is the basis 
for mediating word discrimination and 
name association in kindergarten-aged 
children, and for Durrell’s (1958) find- 
ings that letter-name knowledge was 
related to word recognition and reading 
performance in first grade children. 

One explanation for the verbal label- 
ing assumption to fit the findings of the 
present study is that the child’s labeling 


response given in answer to an adult’s 
question bears no relation to his actual 
identifying response when presented 


with printed words. This identifying 
response may be primarily nonverbal 
(Kurtz, 1955)—controlled by some fa- 
miliar or characteristic feature of the 
word. In the present study, although 
the subjects in the Relevant letter-name 
groups became familiar with the visual 
characteristics of the letters, they also 
learned to give a verbal response when- 
ever these letters appeared. This re- 
sulted in a net interference effect when 
the words containing these letters had 
to be associated with different verbal 
labels. 

A second interpretation of the verbal- 
labeling assumption is that the majority 


of kindergarten-aged children have not 
yet learned how to utilize the informa- 
tion provided them by the letter-name 
labels. This explanation is supported by 
the exceptional performance of one Rele- 
vant group subject in the preliminary 
testing. In the first trial block of the 
reading task this subject consistently re- 
sponded with the letter name. In the 
second trial block the subject began to 
mediate the letter response and picture 
name by using a verbal mediator; e.g., 
“f goes with boat.” By the third trial 
block the subject adopted this mediating 
procedure with each word presented and 
quickly obtained a perfect score. No 
other subject, in over 100 tested, overtly 
used this method of learning the word- 
name association (since the effect of 
relevant letter-name learning was to pro- 
duce interference, it is likely that the 
method was not used covertly). 

These observations suggest that these 
kindergarten children did not use the 
letter names due to their lack of skill in 
providing meaningful verbal mediators 
(Kendler & Kendler, 1962). Children a 
year older may well be able to provide 
this verbal mediation which could, in 
part, account for the results reported by 
Durrell (1958) for his first graders. 

Finally, it is possible that this relation 
between letter-name knowledge and 
reading performance (Durrell, 1958) re- 
sulted from the sound similarity between 
most letter names and their phonic value 
in word pronunciation. In the present 
study there was no relation between the 
sounds of the letter names and the 
sounds in the picture-name words. If 
ybl had been associated with boat, the 
sound similarity between the letter name 
b and the sound “buh” at the beginning 
of the word might have provided a homo- 
phonic mediating link (Riess, 1946). 
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THE DIMENSIONS OF AN OBJECTIVE MEASURE Of 
ACADEMIC SELF-CONCEPT * 


DAVID A. PAYNE ann WILLIAM W. FARQUHAR 


Michigan State University 


Following from the assumption that a student’s self-concept is a func- 
tionally limiting and facilitating factor in academic performance which 
interacts with motivation, a 119-item instrument (the Word Rating List) 
was developed. The instrument was a subtest of an objective motiva- 
tional test battery (the M scales). Student’s rated 1-, 2-, or 3-word 
concepts on a 4-point scale as they thought their teachers would in 
describing them as students. Item discrimination was determined by 
chi square analyses of the responses of statistically defined under- and 
over-achieving 11th graders. 48 items remained after cross-validation for 
each sex, with 35 in common, which had Hoyt’s analysis of variance 
reliabilities in the .90s. Multiple scalogram analysis yielded a global 
dimension for both males and females. Additionally, 3 male and 4 female 
interpretable dimensions were found. 


It has been hypothesized that a stu- 
dent’s self-concept is a functionally lim- 
iting and facilitating factor in academic 
performance. Recent research supports 
this premise (Davidson & Lang, 1960; 
Roth, 1959; Shaw, Edson, & Bell, 1960). 


However, a review of the literature indi- 
cates: (a) lack of a sound theoretical 
framework with which to interpret em- 
pirical results, and (6) failure of many 
investigators to make their self-concept 
referents even plausibly relevant to the 
behavior under study (Wylie, 1961). 

The purpose of the present study was 
to: (a) develop a theoretically based 
objective instrument which purported to 
measure the academic self-concepts of 
high and low motivated students, and 
(5) to determine the psychological di- 
mensions of such an instrument (Payne, 
1961). 


1 The research reported herein is based in 
part on the first author’s PhD dissertation 
submitted to Michigan State University, and 
was supported by United States Office of Edu- 
cation funds. Appreciation is gratefully ex- 
pressed to his committee, W. W. Farquhar 
(Chairman), Willard G. Warrington, Bill L 
Kell, and Gregory A. Miller. 


THEORY 

Drawing upon the symbolic interac- 
tion framework of social psychology, and 
phenomenological field theory, Brook- 
over (1959) has presented theoretical 
tenets basic to the present research. 
These may be summarized as follows: 
(a) the student learns what he perceives 
he is able to learn; and (6) significant 
others, particularly teachers, have im- 
portant influences on the development 
of a student’s self-concept. Influences 
are in the form of expectancies, which in 
turn affect the student’s ability to per- 
form in the academic setting. Influences 
are assimilated by a perceptual mecha- 
nism, the result of which is the looking- 
glass-self. In this research academic 
motivation is considered that which ini- 
tiates and sustains learning, the looking- 
glass-self then, may be viewed as a 
subset of intervening variables which 
influences scholarship. Granting the 
above tenets, it should be possible to 
develop measures of the looking-glass- 
self which discriminate between high 
and low motivated students. Martire’s 
(1956) research would indicate that the 
latter is not an untenable position. 
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METHOD 


Instrumentation 


Items for instrument construction were de- 
veloped based upon (a) the above academic 
self-concept theory; (b) a review of the self- 
concept literature; and (c) a review of the per- 
sonality, motivational, and intellectual char- 
acteristics of students representing academic 
extremes.2 Items consisted of one, two, or 
three word concepts and phrases, i.e., logical, 
purposeful, below average, easily distracted, 
and competitive. Each student was asked to 
rate each of these concepts or words on a four- 
point scale (never, sometimes, usually, and 
always), as they thought their teachers would 
rate the words, in describing him as a student. 
The resulting instrument, the Word Rating 
List (WRL), consisted of 119 concepts. The 
WRL was one of several experimental inven- 
tories, (M scales) developed for inclusion in a 
battery of tests designed to measure motiva- 
tion for academic achievement.® 


Sample Selection 


Basic to the present study was the assump- 
tion that statistically defined underachieving 
students and overachieving students differ sig- 
nificantly in motivation, which in turn would 
be reflected in their academic self-concepts. 
The population consisted of 4,200 eleventh 
grade students from nine high schools in eight 
Michigan cities. Schools were selected, a 
priori, to represent the full range of socioeco- 
nomic environments. From this group under- 
achievers and overachievers were identified 
using the Two Stage Regression Model re- 
ported by Farquhar.* Individuals who varied 
more than one SE,,, from the first to the 
second administration of an aptitude measure, 
were eliminated from the study. Type I error 
(rejecting when should have accepted) was 

2 The assistance of Marion D. Thorpe, 
Wayne D. Chubb, and Ronald G. Taylor in 
the initial stages of instrument development is 
gratefully acknowledged. 

® The battery was developed as part of a 
larger project, “A Comprehensive Study of the 
Motivational Factors in the Achievement of 
Eleventh Grade Students,” under the direction 
of W. W. Farquhar of Michigan State Uni- 
versity, pursuant to a contract with the United 
States Office of Education (Project No. 846). 

* Reported at the 1961 meetings of American 
Personnel and Guidance Association in Denver, 
“A Comparison of Techniques Used in Select- 
ing Under- and Over-Achieving Students,” 
(mimeographed). 


tolerated over Type II error (accepting when 
should have rejected) in the final classification 
of the criterion groups. Overachievers were 
defined as falling at or above one SE.,, relative 
to the linear regression of aptitude (Differen- 
tial Aptitude Tests, Verbal Reasoning subscale) 
on achievement (cumulative grade point aver- 
age for academic subjects for tenth and 
eleventh grades). Conversely, underachievers 
were defined as falling one SE,,, below the 
regression line. Regression equations were de- 
veloped separately for each sex for each of the 
nine schools because of the lack of compara- 
bility in grading milieu. 

Reliability estimates of the achievement 
criterion were +.75 for males, and +.80 for 
females. 

The total number of individuals in each cate- 
gory was randomly dichotomized to provide 
validation and cross-validation groups. The 
validation samples contained 87 male over- 
achievers and 62 male underachievers, and 95 
female overachievers and 84 underachievers. 
The cross-validation samples contained 80 male 
overachievers and 69 male underachievers, and 
96 female overachievers and 86 underachievers. 
Equivalent numbers of individuals are not 
found in each category because of sample loss 
through poor test motivation, inability to fol- 
low directions, absenteeism, and attrition. 


Analysis Procedures 


The four-point WRL rating scale was ar- 
bitrarily dichotomized (Never, Sometimes 
scored 0; Usually, Always scored 1). Dis- 
crimination was determined by a chi square 
analysis of underachievers and overachiever’s 
responses to each item. Level of significance 
was set at .10 for both validation and cross- 
validation. 

Several multivariate analytic procedures are 
available to determine the underlying structure 
of data. Lingoes (1960) found that compared 
to multiple factor analysis, simple Guttman 
analysis, and Loevinger’s method of homo- 
geneous tests, multiple scalogram analysis 
(MSA) was the most efficient method of an- 
alyzing a binary response matrix. For this 
reason MSA was used. Multiple scalogram 
analysis is a nonparametric data reduction 
technique for maximizing interitem reliabilities 
such that both subjects and items are uniquely 
ordered in one or more unidimensional or 
Guttman type scales from a single analysis. 
The degree to which a given scale is uni- 
dimensional is determined by its reproducibility 
(R), i.e., unidimensionality exists if each sub- 
ject’s response pattern is reproducible on the 
basis of knowledge of individual total score 
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and item ordering. Error percentage in item- 
subject matrix reproduction determines di- 
mension reproducibility (0=R=<1). For 
the present study, a 20% criterion was used, 
permitting reproducibilities to range from .80 
to .60. Underachievers and overachievers for 
both validation and cross-validation were com- 
bined for a separate MSA of each sex (298 
males, 312 females). Discrepant achievers were 
pooled to maximize results. 


RESULTS 


Results of the item analyses indicated 
that 67 male and 87 female items met 
the .10 significance level criterion at 
validation. Again using a .10 signifi- 
cance level cross-validation yielded 48 
male and 48 female items, with 35 items 
in common. 

Hoyt’s analysis of variance reliability 
estimates of the 48 cross-validated items 
ranged from .90 to .93 for various male 
samples, with a median value of .92. For 
the 48 cross-validated female items, 
estimates ranged from .88 to .93, with 
median value of .90. 

The following was considered in inter- 
preting the MSA results; (a) self-con- 
cept items have been shown to be asso- 
ciated with achievement, (2) MSA 
yields information about the ranking or 
placement of individuals, and (c) items 
scored in the negative direction (over- 
achievers significantly disagreed) were 
reflected to describe the positive direc- 
tion so that interpretation was unipolar. 
Therefore, MSA dimensions were inter- 
preted in the direction students perceive 
as being associated with high achieve- 
ment.°® 


5 Detailed tabled results of the MSA for 
males and females (including WRL items, scor- 
ing directions and modal response proportions) 
have been deposited with the American Docu- 
mentation Institute. Order Document No. 7210 


from ADI Auxiliary Publications Project, 
Photoduplication Service, Library of Congress ; 
Washington 25, D. C., remitting in advance 
$1.25 for microfilm or $1.25 for photocopies 
Make checks payable to: Chief, Photoduplica- 
tion Service, Library of Congress. 


Male Dimensions 


Results from the MSA for the male 
items yielded four interpretable dimen- 
sions. The dimensions had average re- 
producibilities .77. Two of the 48 cross- 
validated items did not scale on any 
dimension (“confident” and “con- 
tented”).' Labels and _ interpretative 
emphases for the four male dimensions 
are summarized in Table 1. 

The first male dimension (D,) is gen- 
eral or global because it contained 54% 
of the scaled male items. The type of 
individual described feels that academic 
achievement can best be pursued by us- 
ing the usual academically sanctioned 
channels. He sees himself as “inter- 
ested,” “ambitious,” “careful,” “or- 
derly,” and “intelligent” in the major 
aspects of his classroom behavior. He is 
the kind of individual who fits the 
teacher’s sterotype of the good student. 
Understanding and interest in subject 
matter, however, appears to be super- 
ficial. Academic motivation is to the 
degree that knowledge permits gaining 
acceptable grades. 

The second dimension (D,) is similar 
to the first, except that lack of ambition 
and initiative emerges. This type of 
student perceives that high achievement 
is gained by giving considerable thought 
to the practical ramifications of behavior 
before acting or reacting. Conformity is 
to the prevailing classroom norms. A 
recent study by Erb (1961) has pre- 
sented evidence supporting the relation- 
ship to conformity to achievement. 

Evaluation of the item content of the 
third dimension (D,) implies self-initi- 
ated learning. D, describes a student 
who wishes to master basic concepts 
which underlie the reasons for classroom 
presentations. He is also interested in 
practical application of knowledge. High 
achievement is associated with a self- 
concept characterized by “efficient” and 
“intellectual” studiousness. 
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TABLE 1 


SuMMARY Dimensions DeriIvep BY MULTIPLE SCALOGRAM ANALYSIS FROM THE WorD RATING 
List REPRESENTING THE ACADEMIC SELF-CONCEPTS OF COMBINED MALE 
UNDERACHIEVERS AND OVERACHIEVERS 
(N = 298) 


Dimension label 


Achievement via traditional academic 
role taking 


D, Achievement via situational conformity 


, Achievement via intrinsic motivation 


Achievement via teacher conformity 


Sample items 


Teachers feel 
that I am 


Interested 
Efficient 

Ambitious 
Intelligent 


Not different 
Practical 
Concerned 
Not easily 
distracted 
Studious 
Intellectual 
Systematic 
Competent 


Alert 
Purposeful 
A planner 


Interpretive emphasis 


Description of ambitious in- 
dividual interested in learn- 
ing subject matter required 
for securing grades. 

Mode of achievement is by 
conformity to socially ac- 
ceptable types of classroom 
behavior 


Desire to know background, 
underlying concepts, and ap- 
plications of subject matter. 
Mastery of material becomes 
motivation. 

Alert and sensitive to teach- 
ers biases. Integrate these 
cues into activity. 


The fourth male dimension (D,) char- 
acterizes a student who sees himself as 
“alert,” “a planner,” and “purposeful.” 
These words connote organization. Such 
an individual tends to be sensitive to 
teacher cues, which are integrated into 
academic activity. He makes a point of 
using information about teachers biases 
and preferences. D, is similar to D, 
because both emphasize aspects of con- 
formity. The essential differences be- 
tween D, and D, are whether the con- 
forming behavior is self-initiated. Con- 
formity of the students described by D, 
would be a natural result of classroom 
behavior. However, D, describes a stu- 
dent who purposefully initiates teacher 
sanctioned-behavior in order to achieve. 


Female Dimensions 


The MSA for females yielded five 
interpretable dimensions with average 


reproducibilities ‘of .66. One of the 48 
cross-validated items, “ambitious,” did 
not scale. This failure is interesting in 
that ambition was particularly evident 
in the male academic self-concept. 

The female and male dimensions were 
similar in meaning; this would be ex- 
pected with 73% of the items in com- 
mon. Labels and interpretative empha- 
ses for the five female dimensions are 
summarized in Table 2. 

Similar to the male findings the first 
female dimension (D,) is “general” or 
“global” containing 62% of the scaled 
items. D, described an individual who 
conforms to teacher’s wishes, follows di- 
rections immediately, and is generally 
responsive to teacher demands. Two 
items concerned with distractibility also 
had high loadings on D,. About half of 
the items (12) overlap male D,, result- 
ing in parallel interpretations, but with 
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TABLE 2 


SUMMARY Dimensions Derivep sy MULTIPLE SCALOGRAM ANALYSIS FROM THE WorD RATING 
Seir-Concerts oF CoMBINED FEMALE 


List REPRESENTING THE ACADEMIC 


UNDERACHIEVERS AND OVERACHIEVERS 


(N 


Dimension label 


D, Achievement via traditional academic 
role taking 


D, Achievement via intrinsic motivation 


Ds Achievemeni via academic independence 


D, Achievement via teacher conformity 


D; Achievement via intellectualizing 


a somewhat greater emphasis on non- 
cognitive factors. 
Interpretation of the item grouping in 


the second female dimension (D,) is 
similar to D, above. This student per- 
ceives her academic success as being the 
result of “thorough,” “purposeful,” 
“competent,” “systematic,” and “com- 
petitive” classroom behavior. The char- 
acteristics are interpreted as indicating 
a motivation to seek mastery of subject 
matter above requirements for a passing 
grade. 

The stability of the remaining three 
female dimensions, although the repro- 
ducibilities were of an acceptable magni- 
tude (.75, .82, and .80, respectively), 
must be accepted with caution, because 
only three scaled on each 
dimension. 


items 


= 312) 


Sample items 


Teachers feel 
that I am 


Reliable 
Responsible 
Exacting 

Easily distracted 
An achiever 
Competitive 
Purposeful 

Not persuadable 
Not average 
Contented 

I ogic al 
Consistent 
Studious 


Serious 
A thinker 


Successful 


Interpretive emphasis 


Carries out task immediately 
upon assignment. Is orderly, 
efficient, and intelligent 


Motivated by competition. 
Mastery of subject matter is 
primary concern 


Nonconformist, independent 
and not Pursues 
own academic interests 


average 


Identification with teacher 
as significant other. Em- 
phasis on meeting teachers 
specified requirements. 

Describes an individual who 
feels she is content to think, 
contemplate and in general 
abstractly investigate 
demic problems. 


aca- 


The female student characterized in 


the third dimension (D,) is content to 
be independent of normal classroom ac- 
tivity, and pursues academic interests 
which may or may not be similar to those 
of the teacher or classmates. Sufficient 
overlap with teacher’s goals, however, 
allows an acceptable level of achieve- 
ment. 

The fourth dimension (D,) describes 
a girl who sees herself as “logical,”’ “con- 
sistent,” and “studious.”’ She is not 
competitive with teacher standards, but 
plods along doing what is required. D, 
is similar to D,, but the focus is on the 
teacher, rather than the “totality” of the 
academic situation as it might be related 
to successful performance. The interpre- 
tation of general conformity by meeting 
teacher requirements is implied. 
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The fifth female dimension (D,) de- 
scribes a female student who sees her- 
self as content to think, contemplate, 
and abstractly investigate academic 
problems. The item content of D, im- 
plies a self-concept of intellectual. 


Independence of Dimensions 


Intercorrelations among dimensions 
for each sex were computed to determine 
orthogonality and are summarized in 
Table 3. 

All male dimensions demonstrated a 
high positive relationship with D, indi- 
cating the magnitude of a general aca- 
demic self-concept type. The relatively 
low correlation between D, and D, 
might indicate the presence of at least 
two somewhat independent self-concepts 
of conformity associated with achieve- 
ment, 

With the exception of Ds, the female 
dimensions appear to have more com- 
mon than unique variance. Dimension 
three, academic independence, might 
also be viewed as a type of noncon- 
formity. The need for further explora- 
tion of the role of conformity in the 
academic self-concept is evident. 


TABLE 3 


INTERCORRELATIONS AMONG DIMENSIONS 
OBTAINED BY MULTIPLE SCALOGRAM 
ANALYSIS FOR MALES AND FEMALES 


Dimension Ds Ds; dD, Ds 


D, 72 13 ; 61 
Ds 5 jl j 67 
Ds 38 46 7 02 
Dy ote Jil 68 4l 


male 
female 
otherwise 


diagonal are 
diagonal are 
unless 


Note.—Intercorrelations below 
(N= 100), and above the 
(N 100) Values are positive 
indicated 


CONCLUSIONS 


Three conclusions are extracted from 
the results of the present study: (a) it 
is possible to identify a set of reliable 
theory-derived items which significantly 
discriminate between underachieving 
and overachieving (high, low motivated) 
eleventh grade high school students; 
(5) notwithstanding the high positive 
intercorrelations, several relatively inde- 
pendent and interpretable dimensions 
were identified; and (c) both common 
and unique items are found between the 
sexes when measuring academic self- 
concept. 
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EFFECT OF LIST LENGTH AND INTERPOLATED 
LEARNING ON THE LEARNING AND RECALL 
OF FAST AND SLOW LEARNERS'* 


LOWELL SCHOER 
State University of Iowa 


The present study was undertaken to determine whether differential 
susceptibility to inhibition may influence learning of fast vs. slow 
learners. Each S learned either a 7- or a 14-item list of paired adjec- 


tives and recalled and relearned it 24 hours later. 


50% of the Ss learned 


a 9-item interpolated list just prior to the recall of the 7- or 14-item list 
learned originally. A significant interaction between ability level and list 
length was found when the criterion measures employed were trials-to- 
learn and probability of occurrence of a correct response after 2 and 3 
reinforcements. Interpolated learning did not affect the recall of fast and 


slow learners differentially. 


Recent work on the learning of paired 
adjectives has indicated certain differ- 
ences in the learning behavior of fast and 
slow learners: (a) slow learners require 
more reinforcements to achieve a given 
criterion than do fast learners (Stroud & 
Schoer, 1959), (6) slow learners are not 
so likely as fast learners once they make 
a correct response to repeat it on the 
subsequent trial (Underwood, 1954), 
(c) increasing list length affects slow 
learners more adversely than it does fast 
learners (Stroud & Carter, 1961). 

Two explanations for these findings 
have been suggested: (a) the slow 
learner does not derive as much benefit 
from a reinforcement as does the fast 
learner, (6) the slow learner is more 
susceptible than the fast learner to the 
inhibition that may arise as the result 
of having to respond to the other items 
in the list between successive presenta- 
tions of a given item. The first of these 
seems adequate to explain the facts that 
slow learners require more reinforce- 
ments to learn an item and that they 
are not so likely, once they have made 


1 This article is taken from a doctoral dis- 
sertation completed at the State University of 
Iowa. The author wishes to acknowledge the 
aid given him by James B. Stroud, under whose 
' direction the dissertation was done. 


a correct response, to repeat it on the 
subsequent trial; but were it the sole 
factor operating, increasing list length 
should not result in a disproportionate 
increase in list difficulty for the slow 
learner. This suggests, then, that differ- 
ential susceptibility to inhibition may 
account for at least some of the differ- 
ence in the learning speed of fast and 
slow learners. 

In learning a list of items in the con- 
ventional manner the subject, after he 
has responded to Item 1, responds to all 
the other items in the list before that 
item appears again. These other items 
represent interpolated material insofar 
as the learning and recall of Item 1 are 
concerned. The fact that in learning a 
12-item list more trials, more presenta- 
tions of each individual item, and more 
reinforcements per item are required 
than in learning a six-item list is readily 
explainable on the basis of inhibition 
phenomena. Further, the fact that in- 
creasing list length affects slow learners 
more adversely than fast learners in 
these respects indicates that the slow 
learner is more susceptible to inhibition 
phenomena in that, as list length in- 
creases, more items appear between sub- 
sequent presentations of a given item 
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and the opportunity for inhibition to 
operate is increased. 

The results of Stroud and Carter 
(1961) support differential susceptibility 
to inhibition as a possible cause of the 
difference between fast and slow learners 
but the learning method employed was 
that of revised lists, i.e., each item was 
deleted from the learning list after the 
subject had responded to it correctly on 
two trials. It seemed desirable to in- 
vestigate the generality of this phenome- 
non by determining whether or not it 
would appear when the more conven- 
tional procedure of learning intact-lists 
was used. This was one of the purpose: 
of the present investigation. A some- 
what more direct indication of the degree 
to which fast and slow learners are 
affected by inhibition would be given by 
determining the affect of interpolated 
learning on the recall of the two groups 
after 24 hours. This was the second 


purpose of this investigation. The fol- 
lowing specific hypotheses were tested: 


1. Because increasing list length in- 
creases the opportunity for inhibition to 
operate during learning, and the slow 
learner is hypothesized to be more sus- 
ceptible than the fast learner to the 
effects of this inhibition, the slow learner 
will be more adversely affected by in- 
creased list length than will the fast 
learner. 

2. If the slow learner is more sus- 
ceptible to inhibition phenomena than 
the fast learner, the recall of the slow 
learner will be more adversely affected 
than the recall of the fast learner by 
learning material interpolated between 
the learning and recall of the originally 
learned material. 


PROCEDURE 


A preliminary learning task was used to 
select a group of fast and slow learners from 
among the students in the undergraduate 
course in educational psychology at the State 


University of Iowa. Fourteen pairs of adjec- 
tives chosen from the Haagen list (Haagen, 
1949) were presented by means of an overhead 
projector to the entire class. On the first two 
trials each pair was presented for a 2-second 
interval, and the students were asked to learn 
as many as they could; on the third trial only 
the first member of the pair was presented, and 
the students were asked to write down the 
second member; the fourth trial was another 
learning trial; and the fifth a second recall trial. 
The total number of correct responses on the 
two recall trials constituted the learning scores, 
and the top 30 and the bottom 30 in the class 
of 136 made up the pool of individuals from 
which the experimental subjects were randomly 
drawn. 

The individual learning sessions for the ex- 
perimental subjects began 9 weeks after the 
preliminary screening. Each subject learned a 
list on a given day and returned 24 hours later 
for a recall and relearning session. 

Three separate lists of seven-paired adjec- 
tives each formed the basic learning material. 
The adjectives used were chosen from the 
Haagen list in such a way as to secure maxi- 
mum familiarity and homogeneous associative 
strength among the pairs in each of the three 
lists. The 14-item lists were made up by com- 
bining two of the 7-item lists so as to control 
for differences in difficulty between the 7- and 
14-item lists. 

The three lists used for interpolated learning 
consisted of nine pairs of adjectives each, also 
chosen from the Haagen list. These nine-item 
lists were composed of five pairs of adjectives 
unrelated to those in the original learning and 
four pairs synonomous with four pairs in the 
original learning list, the synonomous pairs 
being included in an attempt to maximize inter- 
ference with recall. 

The material was presented to the subject by 
means of a Hull-type memory drum. The first 
word of each pair (stimulus word) appeared 
by itself for 2 seconds, then this stimulus word 
and the second word of the pair (response 
word) appeared together for 2 seconds. The 
second pair was then presented in the same 
manner, then the third, etc., through the list 
until all the pairs had been thus presented 
After a 4-second interval, a second presenta- 
tion of the same list was begun with the words 
presented in a different order so as to dis- 
courage serial learning The two orders were 
alternated and presentations were continued 
until the learning criterion of two consecutive 
perfect trials were reached—two consecutive 
trials on whjch the subject, upon seeing only 
the stimulus member of the pair could correctly 
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give the response member and could do this for 
all the pairs in the list. 

On the first day all subjects first learned a 
warm-up list of three items to a criterion of 
one perfect trial and then immediately learned 
either a 7- or a 14-item list. On the second day 
half the subjects immediately recalled and re- 
learned the list they had learned the preceding 
day, the other half learned a second list (inter- 
polated learning) before recalling and relearn- 
ing the original list. The interpolated list was 
also learned to a criterion of two consecutive 
perfect trials. 


RESULTS 


The statistical design used to analyze 
the results was that of treatments by 
levels (Lindquist, 1953). 

The trials-to-learn are presented in 
Measure A of Table 1 and indicate that 
the procedure used in determining fast 
and slow learners accomplished that end, 
since the F associated with the difference 
between the fast and slow learners is 
50.20 which is significant at well beyond 
the .001 level. When list length is in- 
creased from 7 to 14 items the increase 
in trials-to-learn is 5.6 for the fast 
learner and 10.2 for the slow learner. 
The F for this interaction is 5.42 which 
is significant at greater than the .05 
level. 

Measures B, C, and D in Table 1 
show the probability of a correct re- 
sponse after given numbers of reinforce- 
ments when a reinforcement is defined 
as a previous correct response. These 
figures were derived from probability 
analysis which involves determining the 
proportion of times a subject made a 
correct response to the stimulus item on 
the trial succeeding the one on which he 
made a correct response the first, second, 
or third time. Because the standard 
error of a proportion is a function of its 
magnitude, i.e., the standard error of .90 
based on V = 10 is .095 while the stand- 
ard error of .50 based on N= 10 is .158, 
analysis of variance procedures are not 
applicable in that homogeneity of vari- 


Fast AND SLow LEARNERS 


TABLE 1 
LEARNING AND Recatt Data ror Fast AnD 
Stow LEARNERS ON 7- AND 14-ITEM Lists 
Subjects 


Measure Slow 


(N=24 


Fast 
(N—24 


A Trials-to-learn 
7-item list 
14-item list 
B Probability of occurrence of 
a correct response aiter 1! 
reinforcement 
7-item list 
14-item list 
> Probability of occurrence of 
correct response after 2 rein- 
forcements 
7-item list 
14-item list 
Probability of occurrence of 
a correct response alter 
reinforcements 
7-item list 
14-item list 
= Mean recall scores 
7-item list 
14-item list 
* Mean recall scores 
Preceded by interpolated 
learning 
No interpolated learning 
> Mean number of trials-to- 
relearn 
7-item list 
14-item list 
Mean number 
relearn 
Preceded by 
learning 
No interpolated learning 


4.0 
96 


of trials-to- 


interpolated 


ance cannot be assumed. For purposes 
of analysis, then, the proportions were 
transformed to degrees by use of the 
arcsin transformation. The analysis of 
the transformed data yielded main ef- 
fects of speed of learning significant at 
greater than the .01 level after one, two, 
and three reinforcements; the F being 
22.02 after one, 10.11 after two, and 
7.39 after three reinforcements. The F 
for interaction was less than 1.0 (non- 
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significant) after the first reinforcement, 
5.51 (p < .05) after the second, and 
30.93 (p < .001) after the third rein- 
forcement. 

The recall scores were analyzed by use 
of a three-dimensional design with speed 
of learning, list length and interpolated 
learning as the three variables (Lind- 
quist, 1953). The only significant F 
found was one of 9.12 associated with 
interpolated learning which is significant 
at greater than the .01 level. 

The relearning scores, like the recall 
scores, were analyzed by use of a three- 
dimensional design using the same three 
variables as were used with the recall 
data. The only significant F was one of 
9.12 associated with the main effect of 
interpolated learning, this F being sig- 
nificant at greater than the .01 level. 


DISCUSSION 


The data from this study, using the 
method of intact-lists learned to a cri- 
terion, are consistent with that from the 


Stroud and Carter study (1961) where 
the learning method employed was re- 
vised lists; increasing list length results 
in a greater increase in trials-to-learn 
for the slow learner than it does for the 
fast learner. In one sense, however, this 
interaction may not be an adequate test 
of Hypothesis 1. There is no question 
but that increasing list length from 7 
to 14 items results in a greater increase 
in trials-to-learn for slow than for fast 
learners but the proportion of increase 
is less for the slow than for the fast 
learner, i.e., it took the slow learners 
2.16 times as many trials to learn the 
long list as it had to learn the short list 
while for the fast learners it took 2.40 


times as many trials. There is, then, a 
real question about the use of trials-to- 
learn as a criterion measure, as a test 
of Hypothesis 1; and perhaps the most 
that can be said is that the relationship 
in Measure A of Table 1 is not additive. 
This nonadditivity may, however, be 
more an artifact due to the criterion used 
than to the slow learners being more ad- 
versely affected than the fast learner by 
increased list length. Probability of 
correct response after given numbers of 
reinforcements is not subject to the same 
limitation as is trials-to-learn, so the 
significant interactions in Measures C 
and D of Table 1 indicate more definite 
support of Hypothesis 1 than does the 
interaction in Measure A. 

Hypothesis 2 must be rejected on the 
basis of lack of interaction in F of 
Table 1. The significant main effect of 
interpolated learning indicates that the 
interpolated learning did, in fact, inter- 
fere with recall, but the lack of interac- 
tion in the table means that the in- 
terpolated learning did not affect fast 
and slow learners to significantly differ- 
ent degrees. A possible explanation of 
this may be found in the learning method 
employed. When intact-lists are learned 
to a criterion there will be a number of 
items which the slow learner overlearns, 
i.e., he gets them right early in learning 
and continues responding to them on 
every trial until he learns the complete 
list. As a result of this overlearning it 
may be impossible for the interpolated 
learning to interfere with the recall of 
these items. If there are enough such 
items it would preclude the possibility 
of obtaining significant interaction in the 
table. 
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PARAMETERS OF WORD FLUENCY TASKS* 


ADRIENNE ERLEBACHER anp CHESTER W. HARRIS 
Department of Educational Psychology 
University of Wisconsin 


Word fluency tests, consisting of a given initial pair of letters (bigram) 
which are to be responded to by writing all the English words the S can 


in a 5-minute period, were administered to 90 adult Ss. 


Bigrams were 


chosen to form an orthogonal design on the basis of 2 characteristics: 
(a) either a vowel followed by a consonant, or a consonant followed by a 
vowel; and (b) pool size, or the number of commonly used English 
words which begin with that bigram. Both effects were significant at the 


01 level. 
learning is commented on. 


In 1938, Thurstone suggested that one 
of the “primary” mental abilities is 
“word fluency” (W). Since then, other 
investigators using similar factorial tech- 
niques have replicated this clustering 
of tests such as first letters, first and last 
letters, prefixes, suffixes, anagrams, etc., 
on a factor which they too labeled word 
fluency. French’s (1951) summary of 
factorial results indicates the status of 
W as a factor as of that time. Guilford 
(1956, 1959) has incorporated word 
fluency into his structure of intellect as 
a divergent thinking ability; as such, it 
may be regarded as one (limited) aspect 
of “creativity.” 

Survey of the various factorial studies 
in which W has been isolated or identi- 
fied indicates that a fairly restricted 
number of tests have been used as 
“pure” measures of this factor or ability. 
Typically, the subject is asked to write 
as many English words as he can that 
fit the given restrictions, such as be- 
ginning and/or ending with a given 
letter or letters (first letters, first and 
last letters, etc.), or include some or all 
of a set of given letters (anagrams). The 


1 Delivered at the annual meeting of the 
American Educational Research Association, 
Atlantic City, N. J., February 19, 1962. 

We are grateful to the Research Com- 
mittee, Graduate School, University of Wis- 
consin, for support of the research reported 
here. 


The consistency of these results with other studies of verbal 


subject’s score is generally the number 
of responses, with no scanning to correct 
for repetitions or the “goodness” of the 
response. Thus, this is a production task 
generally scored without reference to 
quality of response. 

The purpose of this paper is to suggest 
that there may be some neglected ques- 
tions regarding word fluency and word 
fluency tasks. The specific question 
raised here is that of the predictability 
of the mean number of responses to such 


tasks for a suitably defined group of 
human beings. Our interest in the pre- 
dictability of the average number of 
responses to word fluency tasks stems, 
in part, from an interest in language or 


verbal learning. It seems obvious that 
word fluency tasks measure some aspect 
of the verbal learning achieved by the 
subject during his prior history. What 
this aspect is, is not well-defined at the 
present. Although it is some kind of 
“knowledge” of words, it is not the same 
kind of “knowledge” of words that is 
eliciated by measures of the “verbal 
comprehension” factor (V), such as vo- 
cabulary tests. The factor studies clearly 
support this distinction. Thus the in- 
triguing question of why V and W tasks 
remain distinct factorially arises. We 
hope eventually to throw some light on 
this question; however, the initial stud- 
ies we are now reporting begin at a 
simple level in an effort to understand 
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better the word fluency tasks themselves. 
Subsequently we expect to extend this 
mode of inquiry to the vocabulary tests. 


Possible Parameters of Word Fluency 
Tasks 


The following aspects of word fluency 
tasks appear to be manipulable at the 
choice of the investigator: 

1. Specific character of the instruc- 
tions to the subject and the fore-exercise 
of familiarization task 

2. Time allowed for responding 

3. Mode of presentation of the stimu- 
lus: auditory versus visual 

4. Mode of response: oral or written 

5. Type of tasks: first letter, first and 
last letter, suffixes, anagrams, etc. 

6. Within a type of task, the task 
item or items: e.g., words beginning 
with ¢, words beginning with a, etc. 

The studies reported here fixed the 
first five of these and varied the sixth, 
the task item. We chose a particular 
type of task, which we called a bigram 
task. The subject was asked to write, 
in a period of 5 minutes, all the English 
words he could think of that begin with 
a particular pair of letters, or bigram. 
We expected to find differences in aver- 
age productivity associated with differ- 
ent bigrams; i.e., we expected to find 
differences in difficulty for bigram task 
items. The problem we set was that of 
finding one or more principles for choos- 
ing bigram task items that would yield 
predictable (at least in a rank-order 
sense) differences in average produc- 
tivity. Our initial hypothesis was the 
obvious one that “pool size” or the 
number of English words actually be- 
ginning with a specified bigram would 
be a predictor of task item difficulty. 


PROCEDURE 


A survey was made of Thorndike and Lorge, 
Part I, (1944) which provides a list of the 
words which occur at least once per million 


words in a large sample of written matter. The 
number of listed words which begin with each 
of the 676 bigrams was recorded and desig- 
nated as the pool size associated with that 
bigram. If, as hypothesized, pool size is in- 
versely related to difficulty of production of 
words called for in the bigram task, this list 
would serve as an index of bigram difficulty 
Two preliminary studies were made. The 
first study used 10 bigrams varying in pool 
size from 20 to 404 words. The results of an 
analysis of variance demonstrated a significant 
difference among bigrams. When grouped 
roughly by pool size (20, 60 and 200+ words 
available) an overall increase in production as 
a fuction of pool size could be discerned. It 
was not, however, without exception. A 
“negative” replication was next tried, using 
seven different bigrams, all of approximately 
the same pool size. As we anticipated, there 
was no statistically significant difference among 
means for the seven bigrams. However, we ob- 
served that when we sorted the seven bigrams 
into two groups—a vowel followed by a con- 
sonant (vc) and a consonant followed by a 
vowel (cv)—production was _ consistently 
higher for the cv bigrams. The main study re- 
ported here was then designed to investigate 
simultaneously pool size and bigram structure 
as factors in the production of responses to 
bigram tasks. The bigrams chosen for the 
main study were: 
Pool size of about 60 words 
Pool size of about 110 words 
Pool size of about 150 words 


os and TU: 

AD and GA: 

EN and so: 
As in the preliminary studies, subjects were 
chosen from students enrolled in education 
courses at the University of Wisconsin. Thirty- 
six were females and 54 were males. Nearly all 
were graduate students or participants in the 
National Defense Education Act, Guidance 
Institute being held at the University of 
Wisconsin at that time. None foreign 
students. 

A repeated measures design was used to 
maximize the amount of data which could be 
gathered from a small number of subjects. A 
Latin square was used so that sequence effects 
and variance due to the ordinal position of the 
presentation of the bigram might be isolated 
from variance due to individual task items. A 
randomly chosen, 6 * 6 Latin square was re- 
plicated six times for females and nine times 
for males. Data were gathered in classes with 
the cooperation of the instructors. Each test 
booklet contained, on the cover page, instruc- 
tions and a fore-exercise (the same for all 
students. The 6 pages of the test booklet 
each contained one of the six bigrams printed 
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in the upper left-hand corner. Following the 
administration of the fore-exercise, students 
were given 5 minutes to produce words begin- 
ning with the bigram on the first page, then 
5 minutes to produce words beginning with the 
bigram on the second page, and so forth. The 
order of presentation of the six bigrams was 
randomly assigned to subjects of the same sex 
within a given class. 


RESULTS 


The data reported here are based on 
an uncensored count of each subject’s 
word production. An analysis of vari- 
ance indicated that there were four 
sources of variation significant beyond 
the .01 level. None of the remaining Fs 
were significant at the .05 level. Three 
of these were: (a) pool size, (0) bigram 
structure, and (c) the interaction of pool 
size and bigram structure. Word pro- 
duction increased as a function of pool 
size, and the cv means were generally 


TABLE 1 


ANALYSIS OF VARIANCE SUMMARY TABLE 


Source df A F 


Subjects 
Sex 
Sequence 
Sex X sequence 
Subjects/sex and 
sequence 78 119.98 


Ordinal position 111.03 

Ordinal position 
sex ; 3.91 a 

Tasks : — —_ 
Poo! size 2333.24 189.85* 
Structure 2949.34 239.98* 
Pool size * struc- 

ture 

Sex X pool size 

Sex & structure 

Sex X pool size X 
structure 2 

Square uniqueness 20 

Sex X square unique- 
ness 20 

Residual 390 


92.06 7.49* 
12.46 1.01 
21.84 1.78 


29.16 
17.89 


3.96 
12.29 


* Less than one 
* Significant beyond the .01 level 


higher than the vc means. A Duncan 
range test (1955) demonstrated that 
with one exception each of the six bigram 
means was significantly different (be- 
yond the .01 level) from each of the 
others. The vc bigram EN (M=16.43) 
did not differ significantly from ap 
(M=16.32). 

The fourth source of significant varia- 
tion was ordinal position. Comparison 
of the means of the six ordinal positions 
showed that production increased 
through the fifth task, and then dropped 
off. The means from first through sixth 
position were 15.05, 16.58, 16.70, 17.40, 
18.23, and 17.68, respectively. The sig- 
nificant ordinal position effect indicates 
that performance differed depending on 
whether a task came first, second, third, 
etc. A Duncan range test indicated that 
performance on tasks in Ordinal Posi- 
tion 1 was significantly poorer than per- 
formance in any other position, and 
performance on tasks in Positions 2 
and 3 was significantly poorer than on 
those in Position 5. 

The mean word production for males 
and females (17.53 and 16.06, respec- 
tively) was not significantly different, 
and there were no significant interactions 
with sex. Neither were there significant 
differences attributable to differences in 
sequence. 


DISCUSSION 


The results of the third study, re- 
ported here in detail, are interesting in 
several respects. For one, although these 
are verbal tasks, sex differences were not 
observed, either as a main effect or as an 
interaction with pool size or bigram 
structure. For another, our initial hy- 
pothesis that mean word production in 
bigram tasks is associated with pool size 
is given additional support; the diffi- 
culty of such tasks can be fairly success- 
fully ordered in advance from knowledge 
of pool size. 
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Bigram structure, which we discovered 
somewhat accidently from the data of 
the second study, also proved to be a 
factor determining difficulty of these 
tasks. Now that this factor has been 
identified, we observe that it may also 
be inferred from data presented by Un- 
derwood and Schulz (1960). For ex- 
ample, they report (Appendix F) the 
number of times each of the letters of 
the alphabet was given as a response to 
each of the possible pairs of initial 
letters; very generally, when the initial 
pair of letters is a vc bigram, the re- 
sponse tends to be a vowel more often 
than when the initial pair is a cv bigram. 
In particular, for the six bigrams used 
in this study, we observe these responses 
in the Underwood and Schulz data: 

os: Responded to with a vowel 56 

times, a consonant 216 times 
Responded to with a vowel 30 
times, a consonant 241 times 
Responded to with a vowel 60 
times, a consonant 213 times 


Responded to with a vowel 30 
times, a consonant 241 times 


Responded to with a vowel 52 
times, a consonant 219 times 
Responded to with a vowel 33 
times, a consonant 239 times 


There is a marked consistency in these 
particular “letter habits” as Underwood 
and Schulz call them. When we look 
at the structure of the English language 
itself, we see that this letter habit of 
following a vc bigram with a vowel dis- 
proportionately often may account in 
part for the lower average production of 
words, for the English words beginning 
with these particular bigrams tend to 
have a consonant, rather than a vowel, 
in the third place. Underwood and 


Schultz (1960, p. 226) show this char- 
acteristic of the language generally. 

The significant interaction of bigram 
structure and pool size must be con- 
sidered. The data suggest that bigrams 
EN and so primarily account for this 
interaction. It would seem reasonable to 
believe that there is something like a 
threshold, with respect to pool size, be- 
yond which (in the 5-minute period we 
set) the letter habit we have referred to 
would no longer seriously limit the recall 
of words. However, we found that ap 
(pool size of 110) and EN (pool size of 
150) had very similar means (approxi- 
mately 16), whereas GA (pool size of 
110) had a mean of 20.49 and so (pool 
size of 150) had a mean of 22.72—a 
finding that contradicts this speculation. 
Further investigation of this is needed. 

The ordinal position effect was not 
anticipated. When one recognizes that 
word fluency, as an independent cog- 
nitive ability, was identified in studies 
which made use of series of tests such 
as the one used here, without regard for 
their ordinal position, this finding may 
have grave implications. No task is 
necessarily independent of the position 
within the test battery in which it is 
given. If test performance differs as a 
function of test position, it is conceivable 
(though not necessary) that the ordinal 
position effect interacts with other vari- 
ables and that correlations may also 
vary as a function of test position within 
a battery. Thus, in a factor analytic 
study where a task is given to all people 
in the same position within the test 
battery, one can legitimately raise the 
question of the extent to which the fac- 
tors are functions of task similarities 
and differences, or are functions of the 
effect of being placed at certain positions 
within the battery. 
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